其他人也提出了类似的问题,但他们的数据结构有点不同。我的数据集有多个列,用于分组变量和数字数据。我需要对每行的数值数据求和,并将总和输出到新的数据列。请参阅示例DATA
集和下面所需的RESULTS
表。我更喜欢使用mutate
中的dplyr
函数找到解决方案。我主要使用dplyr
包来操作我的数据集。我可以通过gather
中的group_by
,sumarise
和dplyr
函数完成此任务,但我正在处理非常大的数据集,这些数据集可能会导致“聚集”的数据表超过2,000,000行。提前致谢。
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))
RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
SUM_STUFF = c(3, 6, 90, 120, 300, 600, 900, 15000, 18000))
答案 0 :(得分:3)
这看起来符合您的需求:
RESULT <- DATA %>%
rowwise() %>%
mutate(SUM_STUFF = sum(STUFF,STUFF2))
关键是使用rowwise
(在您提出问题后可能已添加到dplyr
版本中)。
> RESULT
Source: local data frame [9 x 5]
Groups: <by row>
SITE DATE STUFF STUFF2 SUM_STUFF
(fctr) (fctr) (dbl) (dbl) (dbl)
1 A 1 1 2 3
2 A 1 2 4 6
3 A 2 30 60 90
4 A 2 40 80 120
5 B 3 100 200 300
6 B 3 200 400 600
7 B 3 300 600 900
8 C 4 5000 10000 15000
9 C 4 6000 12000 18000