我有一个数据框,排列如下:
df <- structure(list(NAME1= c("AAA","CCC","BBB","BBB"),
NAME2 = c("BBB", "AAA","DDD","AAA"),
AMT = c(40,20,10,50)),.Names=c("NAME1","NAME2","AMT"),
row.names = c("1", "2", "3", "4"), class =("data.frame"))
我想创建一个ID变量作为字符变量NAME1和NAME2的组合,无论顺序如何(即AAA BBB与BBB AAA相同)并总结AMT。
这就是我想要的结果:
df <- structure(list(NAME1 = c("AAA","CCC", "BBB"),
NAME2 = c("BBB", "AAA","DDD"),
AMT = c(90,20,10),
ID = c(1,2,3)),
.Names = c("NAME1","NAME2","AMT","ID"),
row.names = c("1", "2", "3"), class =("data.frame"))
您的意见将非常感谢。
答案 0 :(得分:2)
您可以创建两个新的分组变量,这些变量按行对值进行排序,以便AAA, BBB
和BBB, AAA
处理相同(因为它们按相同的顺序排列)。之后,分组操作非常简单。我选择使用data.table
:
library(data.table)
df[,c("NAME1_o","NAME2_o")] <- t(apply(cbind(df$NAME1, df$NAME2), 1, function(x) x[order(x)]))
setDT(df)[, .(AMT = sum(AMT), ID = .GRP), by = .(NAME1_o, NAME2_o)]
# NAME1_o NAME2_o AMT ID
#1: AAA BBB 90 1
#2: AAA CCC 20 2
#3: BBB DDD 10 3
答案 1 :(得分:0)
使用dplyr
动词:
df %>%
rowwise() %>%
mutate(PAIR=paste0(sort(c(NAME1,NAME2)),collapse=" ")) %>%
group_by(PAIR) %>%
summarise(AMT=sum(AMT)) %>%
mutate(ID=row_number()) %>%
separate(PAIR, " ", into=c("NAME1","NAME2"))
NAME1 NAME2 AMT ID
1 AAA BBB 90 1
2 AAA CCC 20 2
3 BBB DDD 10 3