Question

我有以下样本数据集。前三列标识特定组。

dat <- read.table(header=TRUE, text="
                 ID  LFrom LTo It1 It2 It3 It4
                  1120    1   2   47  152 259 140
                  2000    1.1   2.1   88  236 251 145
                  2000    1.2   2.1  72  263 331 147
                  1120    1   2   71  207 290 242
                  1120    1   2   47  152 259 140
                  2001    1.2   2.1  72  263 331 147
                  2001    1.1   2   71  207 290 242
                  1120    1   2   47  152 259 140
                  2000    1.1   2.1   88  236 251 145
                  1120    1   2  72  263 331 147
                  2000    1.1   2.1   71  207 290 242
                  ")

dat
     ID LFrom LTo It1 It2 It3 It4
1  1120   1.0 2.0  47 152 259 140
2  2000   1.1 2.1  88 236 251 145
3  2000   1.2 2.1  72 263 331 147
4  1120   1.0 2.0  71 207 290 242
5  1120   1.0 2.0  47 152 259 140
6  2001   1.2 2.1  72 263 331 147
7  2001   1.1 2.0  71 207 290 242
8  1120   1.0 2.0  47 152 259 140
9  2000   1.1 2.1  88 236 251 145
10 1120   1.0 2.0  72 263 331 147
11 2000   1.1 2.1  71 207 290 242

通过使用duplicated功能，我可以找到唯一的组。

dat[!duplicated(dat[,1:3]),]
    ID LFrom LTo It1 It2 It3 It4
1 1120   1.0 2.0  47 152 259 140
2 2000   1.1 2.1  88 236 251 145
3 2000   1.2 2.1  72 263 331 147
6 2001   1.2 2.1  72 263 331 147
7 2001   1.1 2.0  71 207 290 242

独特群体的频率。

aggregate(data = transform(dat[1:3], Freq = seq_len(nrow(dat[1:3]))), Freq ~ ., length)
    ID LFrom LTo Freq
1 1120   1.0 2.0    5
2 2001   1.1 2.0    1
3 2000   1.1 2.1    3
4 2000   1.2 2.1    1
5 2001   1.2 2.1    1

我需要使用plyr或dplyr包来获得如下结果（常见前3列的最后四列的平均值）。任何帮助表示赞赏。

    ID LFrom LTo  It1   It2   It3   It4
1 1120   1.0 2.0 67.5 213.1 285.2 163.5
2 2000   1.1 2.1 69.9 218.6 288.3 173.7
3 2000   1.2 2.1 72.0 263.0 331.0 147.0
4 2001   1.2 2.1 72.0 263.0 331.0 147.0
5 2001   1.1 2.0 71.0 207.0 290.0 242.0

Answer 1

使用data.table：

library(data.table)
setDT(dat)[, lapply(.SD, mean), by=.(ID, LFrom, LTo)]

Answer 2

不确定plyr是否是一项要求或假设，但如果你喜欢基础R：

aggregate(formula = cbind(It1,It2,It3,It4) ~ ID + LFrom + LTo, data = dat, FUN = mean)

唯一组的其他单元格中的平均值

2 个答案: