如果我有一个如下所示的R数据框:
| Value | TestNum | RepNum |
|:-----:|:-------:|:------:|
| 104 | 1 | 1 |
| 101 | 1 | 2 |
| 101 | 1 | 3 |
| 100 | 2 | 1 |
| 100 | 2 | 2 |
| 100 | 2 | 3 |
| 90 | 3 | 1 |
| 90 | 3 | 2 |
| 90 | 3 | 3 |
| 91 | 4 | 1 |
| 94 | 4 | 2 |
| 94 | 4 | 3 |
| 105 | 5 | 1 |
| 105 | 5 | 2 |
| 108 | 5 | 3 |
有没有办法可以修改这个数据框,找到每个独特TestNum的3个RepNum值的平均值,使它看起来像这样:
| Mean | TestNum |
|:----:|:-------:|
| 102 | 1 |
| 100 | 2 |
| 90 | 3 |
| 93 | 4 |
| 106 | 5 |
您可以通过复制和粘贴此代码并执行它来在R中创建此示例数据框。
Value<-c(100,101,100,100,100,100,90,90,90,93,94,94,105,105,108)
TestNum<-c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
RepNum<-c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
df<-data.frame(Value,TestNum,RepNum)
编辑:这里有一个更完整的&#34;数据框的示例我从我希望最终得到的内容开始:
| FileName | Version | Category | Value | TestNum | RepNum |
|:--------:|:-------:|:---------:|:-----:|:-------:|:------:|
| File1 | 1.0.1 | Category1 | 104 | 1 | 1 |
| File1 | 1.0.1 | Category1 | 101 | 1 | 2 |
| File1 | 1.0.1 | Category1 | 101 | 1 | 3 |
| File1 | 1.0.2 | Category1 | 100 | 2 | 1 |
| File1 | 1.0.2 | Category1 | 100 | 2 | 2 |
| File1 | 1.0.2 | Category1 | 100 | 2 | 3 |
| File1 | 1.0.4 | Category1 | 90 | 3 | 1 |
| File1 | 1.0.4 | Category1 | 90 | 3 | 2 |
| File1 | 1.0.4 | Category1 | 90 | 3 | 3 |
| File1 | 1.0.5 | Category1 | 94 | 4 | 1 |
| File1 | 1.0.5 | Category1 | 91 | 4 | 2 |
| File1 | 1.0.5 | Category1 | 94 | 4 | 3 |
| File1 | 1.0.8 | Category1 | 105 | 5 | 1 |
| File1 | 1.0.8 | Category1 | 105 | 5 | 2 |
| File1 | 1.0.8 | Category1 | 108 | 5 | 3 |
结束于此:
| FileName | Version | Category | Mean_Value | TestNum |
|:--------:|:-------:|:---------:|:----------:|:-------:|
| File1 | 1.0.1 | Category1 | 102 | 1 |
| File1 | 1.0.2 | Category1 | 100 | 2 |
| File1 | 1.0.4 | Category1 | 90 | 3 |
| File1 | 1.0.5 | Category1 | 93 | 4 |
| File1 | 1.0.8 | Category1 | 106 | 5 |
您可能已经注意到,FileName
列和Category
列只有1个唯一条目。 Version
列与TestNum
列一起发生变化。因此,在我找到平均值后,简单地添加其他列可能是最容易的。
&#34; full&#34;我正在处理的代码,我获得了几个不同文件和许多独特类别的平均值,但我一直在创建多个数据框,这些数据框是通过在FileName上对原始数据框进行子集化而创建的和类别(以及另外一个&#34;案例&#34;列)。
答案 0 :(得分:2)
您可以使用aggregate
aggregate(x = df$Value, by = list(df$TestNum), FUN = mean)
# Group.1 x
#1 1 100.33333
#2 2 100.00000
#3 3 90.00000
#4 4 93.66667
#5 5 106.00000
您还可以split
首先根据TestNum
的唯一值进行总结
data.frame(test_num = unique(df$TestNum), mean_value = sapply(split(df$Value, df$TestNum), mean))
# test_num mean_value
#1 1 100.33333
#2 2 100.00000
#3 3 90.00000
#4 4 93.66667
#5 5 106.00000
答案 1 :(得分:1)
同样使用data.table
和dplyr
,您可以
library(data.table)
setDT(df)[, mean(Value), by = TestNum]
library(dplyr)
df %>% group_by(TestNum) %>% summarise(mean(Value))
如果还有其他列,则可以在每个TestNum
中使用其他列的第一个值。像这样:
df2<-data.frame(FileName = "File1",
Version = paste0("1.0.", rep(c(1,2,4,5,8), each = 3)),
Value, TestNum, RepNum)
## data.table
keep_cols <- c("FileName", "Version")
setDT(df2)[, c(lapply(.SD, function(x) x[1]), mean_Value = mean(Value)),
by = TestNum, .SDcols = keep_cols]
## dplyr
df2 %>% group_by(TestNum) %>% summarise(FileName = FileName[1],
Version = Version[1],
mean_Value = mean(Value))