我有一个data.frame,如下所示
)
我想对所有行中的条目“Car”和“Orig”列中的“Car”的所有行求和,反之亦然。我想输出如下
table=data.frame(ID=c(rep("Be_01",8),rep("Ce_02",5)),Orig=c("Car","Bus","Truck","Car","Bus","Car","Bike","Truck","Car","Truck","Bus","Bike","Bike"),Orig_counts=c(5,9,8,10,14,4,8,6,10,3,9,10,6), Replace=c("Bike","Truck","Bus","Truck","Truck","Bike","Car","Bus","Bike","Bike","Truck","Car","Car"),Replace_Count=c(9,4,2,7,10,11,12,6,7,5,9,4,2))
>table
ID Orig Orig_counts Replace Replace_Count
Be_01 Car 5 Bike 9
Be_01 Bus 9 Truck 4
Be_01 Truck 8 Bus 2
Be_01 Car 10 Truck 7
Be_01 Bus 14 Truck 10
Be_01 Car 4 Bike 11
Be_01 Bike 8 Car 12
Be_01 Truck 6 Bus 6
Ce_02 Car 10 Bike 7
Ce_02 Truck 3 Bike 5
Ce_02 Bus 9 Truck 9
Ce_02 Bike 10 Car 4
Ce_02 Bike 6 Car 2
是否可以通过R中的聚合函数来实现这一点。
答案 0 :(得分:2)
您可以使用split-apply-combine执行此操作。以下是基数为R的解决方案,使用split
函数按ID拆分数据框,使用lapply
函数汇总数据的每个ID特定子集,以及do.call
使用rbind
函数组合每个ID的汇总数据。
do.call(rbind, lapply(split(dat, dat$ID), function(x) {
data.frame(ID=x$ID[1],
Bike_and_Cars=sum(x$Replace_Count[x$Orig == "Bike" & x$Replace=="Car"]),
Cars_and_Bike=sum(x$Replace_Count[x$Orig == "Car" & x$Replace == "Bike"]))
}))
# ID Bike_and_Cars Cars_and_Bike
# Be_01 Be_01 12 20
# Ce_02 Ce_02 6 7
答案 1 :(得分:2)
如果没有回答您提出的确切问题,您可能会采用更为一般的方法来为您提供更好的服务。
> aggregate(Replace_Count ~ ID + Orig + Replace, data=table, sum)
ID Orig Replace Replace_Count
1 Be_01 Car Bike 20
2 Ce_02 Car Bike 7
3 Ce_02 Truck Bike 5
4 Be_01 Truck Bus 8
5 Be_01 Bike Car 12
6 Ce_02 Bike Car 6
7 Be_01 Bus Truck 14
8 Ce_02 Bus Truck 9
9 Be_01 Car Truck 7
从这里提取您感兴趣的数据子集相当容易。一种方法是创建一个组合列,比如说
table$Move <- with(table, paste0(Orig,"_and_",Replace))
然后使用tidyr
将数据传播出去(您也可以使用reshape2
)
spread(aggregate(Replace_Count ~ ID + Move, data=table, sum), Move, Replace_Count)
ID Bike_and_Car Bus_and_Truck Car_and_Bike Car_and_Truck Truck_and_Bike
1 Be_01 12 14 20 7 NA
2 Ce_02 6 9 7 NA 5