基于变量组合的ID变量

时间:2017-05-23 16:04:13

标签: r dataframe data-manipulation

我有一个数据框,排列如下:

df <- structure(list(NAME1=  c("AAA","CCC","BBB","BBB"), 
                 NAME2    =  c("BBB", "AAA","DDD","AAA"),
                 AMT      = c(40,20,10,50)),.Names=c("NAME1","NAME2","AMT"), 
                 row.names = c("1", "2", "3", "4"), class =("data.frame"))

我想创建一个ID变量作为字符变量NAME1和NAME2的组合,无论顺序如何(即AAA BBB与BBB AAA相同)并总结AMT。

这就是我想要的结果:

df <- structure(list(NAME1 =  c("AAA","CCC", "BBB"), 
                 NAME2     =  c("BBB", "AAA","DDD"),
                 AMT       =  c(90,20,10),
                 ID        =  c(1,2,3)),
                 .Names    =  c("NAME1","NAME2","AMT","ID"), 
                 row.names =  c("1", "2", "3"), class =("data.frame"))

您的意见将非常感谢。

2 个答案:

答案 0 :(得分:2)

您可以创建两个新的分组变量,这些变量按行对值进行排序,以便AAA, BBBBBB, AAA处理相同(因为它们按相同的顺序排列)。之后,分组操作非常简单。我选择使用data.table

library(data.table)

df[,c("NAME1_o","NAME2_o")] <- t(apply(cbind(df$NAME1, df$NAME2), 1, function(x) x[order(x)]))
setDT(df)[, .(AMT = sum(AMT), ID = .GRP), by = .(NAME1_o, NAME2_o)]

#   NAME1_o NAME2_o AMT ID
#1:     AAA     BBB  90  1
#2:     AAA     CCC  20  2
#3:     BBB     DDD  10  3

答案 1 :(得分:0)

使用dplyr动词:

df %>% 
   rowwise() %>% 
   mutate(PAIR=paste0(sort(c(NAME1,NAME2)),collapse=" ")) %>% 
   group_by(PAIR) %>% 
   summarise(AMT=sum(AMT)) %>%
   mutate(ID=row_number()) %>%
   separate(PAIR, " ", into=c("NAME1","NAME2"))

  NAME1 NAME2   AMT    ID
1   AAA   BBB    90     1
2   AAA   CCC    20     2
3   BBB   DDD    10     3