我对R不太熟悉,df
看起来像这样:
df = data.frame(pathway=c("A","A","A","B","B"), S1=c(10,20,25, 15, 20), S2=c(2,4,5,7,8))
rownames(df) = c("G1","G2","G3","G4","G5")
df
pathway S1 S2
G1 A 10 2
G2 A 20 4
G3 A 25 5
G4 B 15 7
G5 B 20 8
不仅仅有S1
和S2
,而是有130,所以最多S130
。此外,有20种不同pathways
,例如从A
到U
。
我想计算pathway A
和pathway B
等S1
,S2
等的值的平均值。欲望输出如下:
pathway S1 S2
A 18.3 3.67
B 17.5 7.5
我无法弄清楚如何做到这一点。有人可以帮忙吗?谢谢!
答案 0 :(得分:2)
我会尝试以下方法:
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -pathway) %>%
group_by(pathway, key) %>%
summarise(group_mean = mean(value)) %>%
ungroup()
# A tibble: 4 x 3
pathway key group_mean
<fctr> <chr> <dbl>
1 A S1 18.333333
2 A S2 3.666667
3 B S1 17.500000
4 B S2 7.500000
这样,您就可以计算S1
,S2
,...,S130
等的平均值。之后,您可以将表重新格式化为您的表格通过在链的末尾添加spread(key, group_mean)
来获得所需的输出:
df %>%
gather(key, value, -pathway) %>%
group_by(pathway, key) %>%
summarise(group_mean = mean(value)) %>%
spread(key, group_mean)
# A tibble: 2 x 3
# Groups: pathway [2]
pathway S1 S2
* <fctr> <dbl> <dbl>
1 A 18.33333 3.666667
2 B 17.50000 7.500000
答案 1 :(得分:1)
使用聚合函数
可以轻松实现boolean has53 = yw.is53WeekYear() ;