获取特定列的所有可能组合,同时保留其他列

时间:2020-07-07 08:54:47

标签: r dataframe

我有一个包含FTA数据的数据框。 C1至C91表示FTA中不同国家/地区的一部分。我想找到从c1到c91的国家的所有可能的成对组合,同时保留“否”,“基础条约”和输入类型列。

例如FTA数据

No   Base_treaty   entry_type              c1           c2         c3
3     3            treaty                Algeria      Angola       Benin    
5     5            treaty                Albania      Bulgaria      NA  
6     6            treaty                Albania      Croatia       NA

所需的输出:

No   Base_treaty   entry_type              ctry1       ctry2        
3     3            treaty                Algeria      Angola       
3     3            treaty                Algeria      Benin       
3     3            treaty                Benin        Angola       
5     5            treaty                Albania      Bulgaria      
5     5            treaty                Albania         NA      
5     5            treaty                Bulgaria        NA   
6     6            treaty                Albania      Croatia   

到目前为止我所做的:

df <- do.call(rbind, lapply(seq_along(cols), function(i) t(combn(output[i,4:6],2)))),其中输出是我的FTA数据。虽然这可以使我在“ c”列中使用成对组合,但是我无法将“ No”,“ Base_treaty”和“ entry_type”复制到每对。

它只给我这个:

        [,1]      [,2]           
[1,] "Algeria"  "Angola"
[2,] "Algeria"  "Benin"      
[3,]  "Benin"    "Angola"       
[4,] "Albania"  "Bulgaria"  
[5,] "Albania"    "NA"           
[6,] "Bulgaria"   "NA"
       .
       . 

       .

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:1)

这里有个想法,很接近:

library(tidyverse)

df <- tribble(
    ~No, ~Base_treaty, ~entry_type, ~c1, ~c2, ~c3
    , 3, 3, "treaty", "Algeria", "Angola", "Benin"    
    , 5, 5, "treaty", "Albania", "Bulgaria", NA  
    , 6, 6, "treaty", "Albania", "Croatia", NA)

df %>%
    pivot_longer(cols = starts_with("c")
                 , values_to = "country") %>%
    left_join(df %>%
                  pivot_longer(cols = starts_with("c")
                               , values_to = "country")
              , by = c("No", "Base_treaty")) %>%
    select(starts_with("country")) %>%
    filter(country.x > country.y &
        country.x != country.y)

  country.x country.y
  <chr>     <chr>    
1 Angola    Algeria  
2 Benin     Algeria  
3 Benin     Angola   
4 Bulgaria  Albania  
5 Croatia   Albania 

pivot_longer()将数据框放入一个长格式,对于每个条约,所有国家都在同一列中。然后,再次离开连接相同的事物,只保留country.x的名称在字母表中排在首位(且不相等)的行。

答案 1 :(得分:1)

你很近。我将列号作为函数放在{ "the_sequence_number": 20200707105904535 }中,在这里我们可以

console.log('event_listen[' + global_weird_counter + ']: to be sure, server responded with [' + aresponsetxt + ']');
var response = JSON.parse(aresponsetxt);
console.log('event_listen[' + global_weird_counter + ']: after json parse: ' + JSON.stringify(response));
将稳定列与组合在一起。要查找国家/地区列的数量,我们可以使用带有正则表达式的console.log('event_listen[' + global_weird_counter + ']: to be sure, server responded with [' + aresponsetxt + ']'); var response = JSON.parse(aresponsetxt); console.log('event_listen[' + global_weird_counter + ']: after json parse: ' + JSON.stringify(response));

combn

正则表达式cbind

  • grep字符串的开头
  • do.call(rbind, combn(grep("^c\\d+$", names(output)), 2, function(x) cbind(output[1:3], setNames(output[x], paste0("c", 1:2))), simplify=F)) # No Base_treaty entry_type c1 c2 # 1 3 3 treaty Algeria Angola # 2 5 5 treaty Albania Bulgaria # 3 6 6 treaty Albania Croatia # 4 3 3 treaty Algeria Benin # 5 5 5 treaty Albania <NA> # 6 6 6 treaty Albania <NA> # 7 3 3 treaty Angola Benin # 8 5 5 treaty Bulgaria <NA> # 9 6 6 treaty Croatia <NA> 字面匹配"^c\\d+$"
  • ^一个或多个数字
  • c字符串结尾

数据:

"c"