需要使用dplyr对多列变量中的数据行求和

时间:2015-03-11 00:12:28

标签: r

其他人也提出了类似的问题,但他们的数据结构有点不同。我的数据集有多个列,用于分组变量和数字数据。我需要对每行的数值数据求和,并将总和输出到新的数据列。请参阅示例DATA集和下面所需的RESULTS表。我更喜欢使用mutate中的dplyr函数找到解决方案。我主要使用dplyr包来操作我的数据集。我可以通过gather中的group_bysumarisedplyr函数完成此任务,但我正在处理非常大的数据集,这些数据集可能会导致“聚集”的数据表超过2,000,000行。提前致谢。

DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                  DATE = c("1","1","2","2","3","3","3","4","4"), 
                  STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
                  STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))

RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                    DATE = c("1","1","2","2","3","3","3","4","4"), 
                    STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
                    STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
                    SUM_STUFF = c(3, 6, 90, 120, 300, 600, 900, 15000, 18000))

1 个答案:

答案 0 :(得分:3)

这看起来符合您的需求:

RESULT <- DATA %>%
  rowwise() %>%
  mutate(SUM_STUFF = sum(STUFF,STUFF2))

关键是使用rowwise(在您提出问题后可能已添加到dplyr版本中)。

> RESULT
Source: local data frame [9 x 5]
Groups: <by row>

    SITE   DATE STUFF STUFF2 SUM_STUFF
  (fctr) (fctr) (dbl)  (dbl)     (dbl)
1      A      1     1      2         3
2      A      1     2      4         6
3      A      2    30     60        90
4      A      2    40     80       120
5      B      3   100    200       300
6      B      3   200    400       600
7      B      3   300    600       900
8      C      4  5000  10000     15000
9      C      4  6000  12000     18000