这是我正在使用的数据类型的一个较小示例:
> df <- data.frame("ID"=c("A1","A1","A1","A1","A2","A2","A2","A3","A3","A3","A3"),
"Cat"=c("corn","wheat","quarry","barley","corn","wheat","lake","corn","wheat","quarry","rye"),
"Count"=c(3,1,3,4,5,2,4,7,2,9,1))
> df
ID Cat Count
1 A1 corn 3
2 A1 wheat 1
3 A1 quarry 3
4 A1 barley 4
5 A2 corn 5
6 A2 wheat 2
7 A2 lake 4
8 A3 corn 7
9 A3 wheat 2
10 A3 quarry 9
11 A3 rye 1
我有几百个不同的ID,每个ID都有一个大约24种不同类别类型的计数条目。并非每个ID都有每个类别的条目。我想要做的是创建一个新的类别类型,为每个唯一ID总结一系列其他类别。例如,这将是上述数据的输出:
ID Cat Count
1 A1 crops 8
2 A1 quarry 3
3 A2 crops 7
4 A2 lake 4
5 A3 crops 10
6 A3 quarry 9
...如果我想将玉米,小麦,大麦和黑麦加入一个新类别“作物”,但不包括采石场和湖泊。
我已经成功地使用“aggregate”来生成这个数据框,但是我无法找到一种方法来创建一个由多行总和构成的全新行,所有行都在一个ID内数。
感谢您的任何意见!
答案 0 :(得分:2)
我们可以使用data.table
。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df)
)。分配&#34; Cat&#34;到#&#34;庄稼&#34;对于那些没有“采石”的元素或者&#39;湖&#39;然后按&#34; Cat&#34;分组。和&#34; ID&#34;,我们得到&#34; Count&#34;的sum
。
library(data.table)
setDT(df)[!(Cat %chin% c("quarry", "lake")), Cat := "crops"]
df[, .(Count=sum(Count)),.(ID, Cat)]
# ID Cat Count
#1: A1 crops 8
#2: A1 quarry 3
#3: A2 crops 7
#4: A2 lake 4
#5: A3 crops 10
#6: A3 quarry 9
或者使用base R
我们transform
数据集replace
不是&#34; quarry&#34;或&#34;湖&#34;使用&#34; crop&#34;,然后aggregate
获取&#34;计数&#34;的sum
由&#34; Cat&#34;分组和&#34; ID&#34;。
df1 <- transform(df, Cat = replace(as.character(Cat),
!(Cat %in% c("quarry", "lake")), "crops"))
aggregate(Count~., df1, sum)