如何计算基于另一列内容的方法?

时间:2018-01-01 23:29:37

标签: r dataframe

我对R不太熟悉,df看起来像这样:

df = data.frame(pathway=c("A","A","A","B","B"), S1=c(10,20,25, 15, 20), S2=c(2,4,5,7,8))
rownames(df) = c("G1","G2","G3","G4","G5")

df
   pathway S1 S2
G1       A 10  2
G2       A 20  4
G3       A 25  5
G4       B 15  7
G5       B 20  8

不仅仅有S1S2,而是有130,所以最多S130。此外,有20种不同pathways,例如从AU

我想计算pathway Apathway BS1S2等的值的平均值。欲望输出如下:

 pathway   S1    S2
       A 18.3  3.67
       B 17.5   7.5

我无法弄清楚如何做到这一点。有人可以帮忙吗?谢谢!

2 个答案:

答案 0 :(得分:2)

我会尝试以下方法:

library(dplyr)
library(tidyr)

df %>% 
  gather(key, value, -pathway) %>% 
  group_by(pathway, key) %>% 
  summarise(group_mean = mean(value)) %>% 
  ungroup()

# A tibble: 4 x 3
  pathway   key group_mean
   <fctr> <chr>      <dbl>
1       A    S1  18.333333
2       A    S2   3.666667
3       B    S1  17.500000
4       B    S2   7.500000

这样,您就可以计算S1S2,...,S130等的平均值。之后,您可以将表重新格式化为您的表格通过在链的末尾添加spread(key, group_mean)来获得所需的输出:

df %>% 
  gather(key, value, -pathway) %>% 
  group_by(pathway, key) %>% 
  summarise(group_mean = mean(value)) %>% 
  spread(key, group_mean)

# A tibble: 2 x 3
# Groups:   pathway [2]
  pathway       S1       S2
*  <fctr>    <dbl>    <dbl>
1       A 18.33333 3.666667
2       B 17.50000 7.500000

答案 1 :(得分:1)

使用聚合函数

可以轻松实现
boolean has53 = yw.is53WeekYear() ;