查找数据框中某些唯一类别的平均值

时间:2017-06-08 16:44:36

标签: r average

如果我有一个如下所示的R数据框:

| Value | TestNum | RepNum |
|:-----:|:-------:|:------:|
| 104   |       1 |      1 |
| 101   |       1 |      2 |
| 101   |       1 |      3 |
| 100   |       2 |      1 |
| 100   |       2 |      2 |
| 100   |       2 |      3 |
| 90    |       3 |      1 |
| 90    |       3 |      2 |
| 90    |       3 |      3 |
| 91    |       4 |      1 |
| 94    |       4 |      2 |
| 94    |       4 |      3 |
| 105   |       5 |      1 |
| 105   |       5 |      2 |
| 108   |       5 |      3 |

有没有办法可以修改这个数据框,找到每个独特TestNum的3个RepNum值的平均值,使它看起来像这样:

| Mean | TestNum |
|:----:|:-------:|
| 102  |       1 |
| 100  |       2 |
| 90   |       3 |
| 93   |       4 |
| 106  |       5 |

您可以通过复制和粘贴此代码并执行它来在R中创建此示例数据框。

Value<-c(100,101,100,100,100,100,90,90,90,93,94,94,105,105,108)
TestNum<-c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
RepNum<-c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)

df<-data.frame(Value,TestNum,RepNum)

编辑:这里有一个更完整的&#34;数据框的示例我从我希望最终得到的内容开始:

| FileName | Version |  Category | Value | TestNum | RepNum |
|:--------:|:-------:|:---------:|:-----:|:-------:|:------:|
| File1    | 1.0.1   | Category1 |   104 |       1 |      1 |
| File1    | 1.0.1   | Category1 |   101 |       1 |      2 |
| File1    | 1.0.1   | Category1 |   101 |       1 |      3 |
| File1    | 1.0.2   | Category1 |   100 |       2 |      1 |
| File1    | 1.0.2   | Category1 |   100 |       2 |      2 |
| File1    | 1.0.2   | Category1 |   100 |       2 |      3 |
| File1    | 1.0.4   | Category1 |    90 |       3 |      1 |
| File1    | 1.0.4   | Category1 |    90 |       3 |      2 |
| File1    | 1.0.4   | Category1 |    90 |       3 |      3 |
| File1    | 1.0.5   | Category1 |    94 |       4 |      1 |
| File1    | 1.0.5   | Category1 |    91 |       4 |      2 |
| File1    | 1.0.5   | Category1 |    94 |       4 |      3 |
| File1    | 1.0.8   | Category1 |   105 |       5 |      1 |
| File1    | 1.0.8   | Category1 |   105 |       5 |      2 |
| File1    | 1.0.8   | Category1 |   108 |       5 |      3 |

结束于此:

| FileName | Version |  Category | Mean_Value | TestNum |
|:--------:|:-------:|:---------:|:----------:|:-------:|
| File1    | 1.0.1   | Category1 |        102 |       1 |
| File1    | 1.0.2   | Category1 |        100 |       2 |
| File1    | 1.0.4   | Category1 |         90 |       3 |
| File1    | 1.0.5   | Category1 |         93 |       4 |
| File1    | 1.0.8   | Category1 |        106 |       5 |

您可能已经注意到,FileName列和Category列只有1个唯一条目。 Version列与TestNum列一起发生变化。因此,在我找到平均值后,简单地添加其他列可能是最容易的。

&#34; full&#34;我正在处理的代码,我获得了几个不同文件和许多独特类别的平均值,但我一直在创建多个数据框,这些数据框是通过在FileName上对原始数据框进行子集化而创建的和类别(以及另外一个&#34;案例&#34;列)。

2 个答案:

答案 0 :(得分:2)

您可以使用aggregate

aggregate(x = df$Value, by = list(df$TestNum), FUN = mean)
#  Group.1         x
#1       1 100.33333
#2       2 100.00000
#3       3  90.00000
#4       4  93.66667
#5       5 106.00000

您还可以split首先根据TestNum的唯一值进行总结

data.frame(test_num = unique(df$TestNum), mean_value = sapply(split(df$Value, df$TestNum), mean))
#  test_num mean_value
#1        1  100.33333
#2        2  100.00000
#3        3   90.00000
#4        4   93.66667
#5        5  106.00000

答案 1 :(得分:1)

同样使用data.tabledplyr,您可以

library(data.table)
setDT(df)[, mean(Value), by = TestNum]

library(dplyr)
df %>% group_by(TestNum) %>% summarise(mean(Value))

如果还有其他列,则可以在每个TestNum中使用其他列的第一个值。像这样:

df2<-data.frame(FileName = "File1", 
                Version = paste0("1.0.", rep(c(1,2,4,5,8), each = 3)),
                Value, TestNum, RepNum)


## data.table 
keep_cols <- c("FileName", "Version")
setDT(df2)[, c(lapply(.SD, function(x) x[1]), mean_Value = mean(Value)), 
           by = TestNum, .SDcols = keep_cols]

## dplyr
df2 %>% group_by(TestNum) %>% summarise(FileName = FileName[1], 
                                        Version = Version[1], 
                                        mean_Value = mean(Value))