Question

我想对这些数据进行分组，但在分组时会对某些列应用不同的功能。

ID  type isDesc isImage
1   1    1      0
1   1    0      1
1   1    0      1
4   2    0      1
4   2    1      0
6   1    1      0
6   1    0      1
6   1    0      0

我希望按ID进行分组，列isDesc和isImage可以求和，但我希望得到类型的值。 type在整个数据集中都是相同的。结果应如下所示：

ID  type isDesc isImage
1   1    1      2
4   2    1      1
6   1    1      1

目前我正在使用

library(plyr)
summarized = ddply(data, .(ID), numcolwise(sum))

但它简单地总结了所有列。您不必使用ddply，但如果您认为这对我的工作有好处，我会坚持下去。 data.table库也是另一种选择

Answer 1

使用data.table：

require(data.table)
dt <- data.table(data, key="ID")
dt[, list(type=type[1], isDesc=sum(isDesc), 
                  isImage=sum(isImage)), by=ID]

#    ID type isDesc isImage
# 1:  1    1      1       2
# 2:  4    2      1       1
# 3:  6    1      1       1

使用plyr：

ddply(data , .(ID), summarise, type=type[1], isDesc=sum(isDesc), isImage=sum(isImage))
#   ID type isDesc isImage
# 1  1    1      1       2
# 2  4    2      1       1
# 3  6    1      1       1

修改：使用data.table的{{1}}，您可以执行此操作，以防有太多要汇总的列，而其他列只是取得第一个值。

.SDcols

您可以提供列名或列号作为.SDcols的参数。例如：dt1 <- dt[, lapply(.SD, sum), by=ID, .SDcols=c(3,4)] dt2 <- dt[, lapply(.SD, head, 1), by=ID, .SDcols=c(2)] > dt2[dt1] # ID type isDesc isImage # 1: 1 1 1 2 # 2: 4 2 1 1 # 3: 6 1 1 1也有效。

R - 分组数据，但将不同的功能应用于不同的列

1 个答案: