Question

这可能是一个愚蠢的问题，但我已经阅读了Crawley关于数据框架的章节并浏览了互联网，但还没有能够做任何事情。

这是一个类似于我的样本数据集：

> data<-data.frame(site=c("A","A","A","A","B","B"), plant=c("buttercup","buttercup",
"buttercup","rose","buttercup","rose"), treatment=c(1,1,2,1,1,1), 
plant_numb=c(1,1,2,1,1,2), fruits=c(1,2,1,4,3,2),seeds=c(45,67,32,43,13,25))
> data
  site     plant treatment plant_numb fruits seeds
1    A buttercup         1          1      1    45
2    A buttercup         1          1      2    67
3    A buttercup         2          2      1    32
4    A      rose         1          1      4    43
5    B buttercup         1          1      3    13
6    B      rose         1          2      2    25

我想要做的是创建一个场景，其中“种子”和“水果”在每个独特的网站和植物与植物治疗与治疗存在plant_numb组合。理想情况下，这会导致行减少，但保留原始列（即我需要上面的示例看起来像这样：）

  site     plant treatment plant_numb fruits seeds
1    A buttercup         1          1      3   112
2    A buttercup         2          2      1    32
3    A      rose         1          1      4    43
4    B buttercup         1          1      3    13
5    B      rose         1          2      2    25

这个例子非常基本（我的数据集大约是5000行），虽然在这里你只看到两行需要求和，需要求和的行数各不相同，范围从1到45

到目前为止，我尝试过rowum（）和tapply（），结果非常糟糕（错误告诉我这些函数对于因素没有意义），所以如果你甚至可以指出我正确的方向，我会非常感谢！

非常感谢！

Answer 1

希望以下代码相当不言自明。它使用基本功能“聚合”，基本上这是说站点，植物，处理和plant_num的每个独特组合看果实和种子总和的总和。

# Load your data
data <- data.frame(site=c("A","A","A","A","B","B"), plant=c("buttercup","buttercup",
"buttercup","rose","buttercup","rose"), treatment=c(1,1,2,1,1,1), 
plant_numb=c(1,1,2,1,1,2), fruits=c(1,2,1,4,3,2),seeds=c(45,67,32,43,13,25)) 

# Summarize your data
aggregate(cbind(fruits, seeds) ~ 
      site + plant + treatment + plant_numb, 
      sum, 
      data = data)
#  site     plant treatment plant_numb fruits seeds
#1    A buttercup         1          1      3   112
#2    B buttercup         1          1      3    13
#3    A      rose         1          1      4    43
#4    B      rose         1          2      2    25
#5    A buttercup         2          2      1    32

行的顺序发生变化（并按网站，工厂等排序），但希望这不是太大的问题。

另一种方法是使用plyr包中的ddply。

library(plyr)
ddply(data, .(site, plant, treatment, plant_numb), 
      summarize, 
      fruits = sum(fruits), 
      seeds = sum(seeds))
#  site     plant treatment plant_numb fruits seeds
#1    A buttercup         1          1      3   112
#2    A buttercup         2          2      1    32
#3    A      rose         1          1      4    43
#4    B buttercup         1          1      3    13
#5    B      rose         1          2      2    25

Answer 2

为了完整起见，这里是data.table解决方案，正如@Chase所建议的那样。对于较大的数据集，这可能是最快的方法：

library(data.table)
data.dt <- data.table(data)
setkey(data.dt, site)
data.dt[, lapply(.SD, sum), by = list(site, plant, treatment, plant_numb)]

     site     plant treatment plant_numb fruits seeds
[1,]    A buttercup         1          1      3   112
[2,]    A buttercup         2          2      1    32
[3,]    A      rose         1          1      4    43
[4,]    B buttercup         1          1      3    13
[5,]    B      rose         1          2      2    25

lapply(.SD, sum)部分汇总了不属于分组集的所有列（即不在by函数中的列）

Answer 3

很长时间以后才更新此答案，dplyr / tidyverse解决方案就是

library(tidyverse)

data %>% 
  group_by(site, plant, treatment, plant_numb) %>% 
  summarise(fruits=sum(fruits), seeds=sum(seeds))

根据特定因子组合对行进行求和

3 个答案: