我有一个数据框(下面的示例),包含943列和500行。
df <-data.frame(Rep=c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), Depth=c("D", "D", "D", "M", "M", "M", "D", "D", "D", "M", "M", "D", "D"), T0= c(-165,-163,-160,-161,-270,165,-163,-160,-161,-270,-181,-231, -230), T0.01= c(458,459,457,342,158,458,459,457,342,158,324,333,320), T0.02=c(-151,-153,-131,-125,-130,-151,-153,-131,-125,-130,-120, -130,-120))
我需要在我的数据集中获取第7:943列的列中位数(所有带有数字数据的列......它们也都以标题“T”开头,如T0,T0.01等)。但是,我只需要列的中间行为特定的行子集。该子集将基于“Rep”和“Depth”。例如,我需要一个列中位数的矢量用于“深度D处的Rep 1”,然后是“深度为M的Rep 1”的列中值矢量。我总共有24个Reps和3个深度,并且需要所有组合的中位数向量,总共产生3x24 = 72个向量。这会产生一个像这样结构的表(转换版本也可以):
df <-data.frame(Rep=c(1, 1, 1, 2, 2, 2), Depth=c("D", "M", "S", "D", "M", "S"), T0= c(-163,-160,-161,-270,165, 165), T0.01= c(458,459,457,342,158,458), T0.02=c(-151,-153,-131,-125,-130,-151))
Rep Depth T0 T0.01 T0.02
1 D -163 458 -151
1 M -160 459 -153
1 S -161 457 -131
2 D -270 342 -125
2 M 165 158 -130
2 S 165 458 -151
此外,我需要计算这些相同数据子集的第7列:943(“T”列)中所有单元格的方差。这将为每个子集产生一个数字(而不是矢量)。
我已经尝试了所有这些的子集,tapply,grepl函数,但似乎无法让他们做我想要的。感谢。
答案 0 :(得分:0)
使用您提供的数据:
library(dplyr)
df %>%
group_by(Rep, Depth) %>%
summarise_each(funs(median, var))
Rep Depth T0_median T0.01_median T0.02_median T0_var T0.01_var T0.02_var
(dbl) (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 1 D -163.0 458.0 -151.0 6.333333 1.000 148.0000
2 1 M -215.5 250.0 -127.5 5940.500000 16928.000 12.5000
3 2 D -161.0 457.0 -131.0 2.333333 4486.333 217.3333
4 2 M 165.0 458.0 -151.0 NA NA NA
5 3 D -230.5 326.5 -125.0 0.500000 84.500 50.0000
6 3 M -225.5 241.0 -125.0 3960.500000 13778.000 50.0000
或者,如果您想使分组更具描述性:
df %>%
mutate(group=paste("Rep",Rep,"at Depth", Depth)) %>%
group_by(group) %>%
summarise_each(funs(median, var), matches("^T"))
group T0_median T0.01_median T0.02_median T0_var T0.01_var T0.02_var
(chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Rep 1 at Depth D -163.0 458.0 -151.0 6.333333 1.000 148.0000
2 Rep 1 at Depth M -215.5 250.0 -127.5 5940.500000 16928.000 12.5000
3 Rep 2 at Depth D -161.0 457.0 -131.0 2.333333 4486.333 217.3333
4 Rep 2 at Depth M 165.0 458.0 -151.0 NA NA NA
5 Rep 3 at Depth D -230.5 326.5 -125.0 0.500000 84.500 50.0000
6 Rep 3 at Depth M -225.5 241.0 -125.0 3960.500000 13778.000 50.0000
UPDATE:因此,对于所有数据列上的组差异,这是你的意思(do
语句可能比它需要的更复杂):
df %>%
mutate(group=paste("Rep",Rep,"at Depth", Depth)) %>%
select(-Rep, -Depth) %>%
group_by(group) %>%
do(data.frame(variance=var(unlist(.[,sapply(., is.numeric)]))))
group variance
(chr) (dbl)
1 Rep 1 at Depth D 93682.36
2 Rep 1 at Depth M 53501.60
3 Rep 2 at Depth D 81997.03
4 Rep 2 at Depth M 92764.33
5 Rep 3 at Depth D 70057.87
6 Rep 3 at Depth M 51781.50