Question

假设我有一个如下数据框：

  year stint  ID  W
1 2003     1 abc 10
2 2003     2 abc  3
3 2003     1 def 16
4 2004     1 abc 15
5 2004     1 def 11
6 2004     2 def  7

我想合并数据，使其看起来像

  year  ID  W
1 2003 abc 13
3 2003 def 16
4 2004 abc 15
5 2004 def 18

我找到了根据需要合并数据的方法，但我非常确定有更好的方法。

combinedData = unique(ddply(data, "ID", function(x) {
    ddply(x, "year", function(y) {
        data.frame(ID=x$ID, W=sum(y$W))
    })
}))
combinedData[order(combinedData$year),]

这会产生以下输出：

   year  ID  W
1  2003 abc 13
7  2003 def 16
4  2004 abc 15
10 2004 def 18

具体我不喜欢我必须使用unique（否则我在输出的数据中得到每年的唯一组合，ID，W三次），我不喜欢行号不顺序。我怎样才能更干净地做到这一点？

Answer 1

使用基础R执行此操作：

aggregate(W~year+ID, df, sum)

#  year  ID  W
#1 2003 abc 13
#2 2004 abc 15
#3 2003 def 16
#4 2004 def 18

数据

df <- structure(list(year = c(2003L, 2003L, 2003L, 2004L, 2004L, 2004L ), stint = c(1L, 2L, 1L, 1L, 1L, 2L), ID = structure(c(1L, 1L, 2L, 1L, 2L, 2L), .Label = c("abc", "def"), class = "factor"), W = c(10L, 3L, 16L, 15L, 11L, 7L)), .Names = c("year", "stint", "ID", "W"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))

使用ddply在多个级别拆分数据帧的正确方法

1 个答案: