Question

我有一个176个植物种群的数据集，共3次重复（R1，R2，R3）。我做了一个表（下面）。现在我想取每个人的R1，R2和R3的平均值，并将值写在我的.CSV数据文件的新列中。我可以用R做吗？请帮忙。

##demo file

| geno  | trait1    | trait2    | trait3    | trait4    |
|------ |--------   |--------   |--------   |--------   |
| 1_R1  | 1.891     | 2.561     | 0.9       | 11        |
| 1_R2  | 10.341    | 2.121     | 0.6       | 2         |
| 1_R3  | 9.451     | 6.781     | 4.56      | 7         |
| 2_R1  | 11.09     | 9.191     |           | 8         |

Answer 1

在dplyr中执行此操作会更容易。假设＆＃34; geno＆＃34;列具有＆＃34; id＆＃34;和＆＃34; geno＆＃34;信息，我们需要首先拆分＆＃34; geno＆＃34;柱。使用separate执行此操作，然后使用trait获取每个mutate_each列的平均值。 mutate_each中有一个选项可以选择列名。我们可以使用starts_with，end_with，contains，matches等等......在此，我指定-不要使用该列。之后，unite列＆＃34; geno1＆＃34;和＆＃34; id＆＃34;使用原始left_join添加到单个列＆＃34; geno＆＃34;，df。

library(dplyr)
library(tidyr)
 df1 <- df %>%
            separate(geno, c('id', 'geno1'))%>%
            group_by(id)%>%
            mutate_each(funs(mean=mean(., na.rm=TRUE)),-geno1) %>%
            unite(geno, id, geno1)
 colnames(df1)[-1] <- paste(colnames(df1)[-1], 'mean', sep="_")
 left_join(df, df1, by='geno')
 #  geno trait1 trait2 trait3 trait4 trait1_mean trait2_mean trait3_mean
 #1 1_R1  1.891  2.561   0.90     11    7.227667       3.821        2.02
 #2 1_R2 10.341  2.121   0.60      2    7.227667       3.821        2.02
 #3 1_R3  9.451  6.781   4.56      7    7.227667       3.821        2.02
 #4 2_R1 11.090  9.191     NA      8   11.090000       9.191         NaN
 #  trait4_mean
 #1    6.666667
 #2    6.666667
 #3    6.666667
 #4    8.000000

或data.table相对容易一些。使用data.frame将data.table转换为setDT。通过将（nm1）分配给每个列的平均值来创建新列:=。我们使用lapply(..)来获取.SDcols中指定的列的平均值。

 library(data.table)
 nm1 <- paste(colnames(df)[-1], 'mean', sep="_")
 setDT(df)[, (nm1):= lapply(.SD, mean, na.rm=TRUE),
              list(id=sub('_.*', '', geno)),.SDcols=2:5]

或者，如果您只是需要＆＃34;表示摘要＆＃34;每个＆＃34;列＆＃34;通过＆＃34; id＆＃34;，你可以在base R中完成。确保指定na.action=na.pass或者＆＃34;默认＆＃34;设置将删除整行，从而产生不同的输出。

df$id <- sub('_.*', '', df$geno)
aggregate(.~id, df[-1], FUN=mean, na.action=na.pass)
#  id    trait1 trait2 trait3   trait4
#1  1  7.227667  3.821   2.02 6.666667
#2  2 11.090000  9.191     NA 8.000000

数据

df <- structure(list(geno = c("1_R1", "1_R2", "1_R3", "2_R1"), 
trait1 = c(1.891, 10.341, 9.451, 11.09), trait2 = c(2.561, 2.121, 6.781, 
9.191), trait3 = c(0.9, 0.6, 4.56, NA), trait4 = c(11L, 2L, 7L, 8L
)), .Names = c("geno", "trait1", "trait2", "trait3", "trait4"
 ), class = "data.frame", row.names = c(NA, -4L))

R数据操作

1 个答案:

数据