我有一个大的基因表达数据框,其中有重复的基因代表不同的样本(组)。对于重复的基因,我需要根据同一组(列)中每个重复的平均值选择一行。
以下是我的数据框的一个小例子:
GENES=c("7A5", "A1BG", "A1BG", "A1BG","AAAS","AAAS", "AFDS","AFDS","AFDS")
Group1 = c(2.1471840, -0.9092227, -1.4875100, -2.79559765, 0.05143231, -1.25764808, 0.6104962, 0.09226673, -0.8037355)
Group2 = c(-0.3709474, 1.4587290, 1.4545832, -0.27379895, -0.45116476, 1.56286706, -0.9225275, -0.54779659, -1.0586287)
Group3 = c(-1.1321667, -1.3051079, -0.9658358, -0.05914144, -0.20133056, 0.03029207, 1.0015907, 1.18145151, 0.5360956)
Group4 = c(0.6824169, 0.1645328, 2.6276603, 1.11739548, -1.13592005, -0.12666909, -0.4667365, -0.80153098, -1.1085319)
Group5 = c(1.1014914, -1.4461279, 1.0965057, -1.58379531, -0.12457328, 0.59232328, 0.2319656, 0.46981373, -0.4540254)
df=data.frame(GENES,Group1,Group2,Group3,Group4,Group5)
> df
GENES Group1 Group2 Group3 Group4 Group5
1 7A5 2.14718400 -0.3709474 -1.13216670 0.6824169 1.1014914
2 A1BG -0.90922270 1.4587290 -1.30510790 0.1645328 -1.4461279
3 A1BG -1.48751000 1.4545832 -0.96583580 2.6276603 1.0965057
4 A1BG -2.79559765 -0.2737989 -0.05914144 1.1173955 -1.5837953
5 AAAS 0.05143231 -0.4511648 -0.20133056 -1.1359200 -0.1245733
6 AAAS -1.25764808 1.5628671 0.03029207 -0.1266691 0.5923233
7 AFDS 0.61049620 -0.9225275 1.00159070 -0.4667365 0.2319656
8 AFDS 0.09226673 -0.5477966 1.18145151 -0.8015310 0.4698137
9 AFDS -0.80373550 -1.0586287 0.53609560 -1.1085319 -0.4540254
例如,基因A1BG有3个重复。所以,对于A1BG Group1的新值我需要:
mean(df[2,2],df[3,2],df[4,2])
对于Group2,我需要:
mean(df[2,3],df[3,3],df[4,3])
为所有小组做同样的事情。
答案 0 :(得分:2)
在dplyr中使用summarise_all()
函数:
library(dplyr)
df1 <- df %>%
group_by(GENES) %>%
summarise_all(mean)
结果:
# A tibble: 4 x 6
GENES Group1 Group2 Group3 Group4 Group5
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7A5 2.15 -0.371 -1.13 0.682 1.10
2 A1BG -1.73 0.880 -0.777 1.30 -0.644
3 AAAS -0.603 0.556 -0.0855 -0.631 0.234
4 AFDS -0.0337 -0.843 0.906 -0.792 0.0826
答案 1 :(得分:2)
在基地R:
aggregate(.~GENES,df,mean)
# GENES Group1 Group2 Group3 Group4 Group5
# 1 7A5 2.14718400 -0.3709474 -1.13216670 0.6824169 1.10149140
# 2 A1BG -1.73077678 0.8798377 -0.77669505 1.3031962 -0.64447250
# 3 AAAS -0.60310789 0.5558512 -0.08551924 -0.6312946 0.23387500
# 4 AFDS -0.03365752 -0.8429843 0.90637927 -0.7922665 0.08258464