按组和列名称汇总数据

时间:2019-09-17 05:08:05

标签: r dplyr tidyverse

我有以下数据框

library(tidyverse)    
ID <- c('A','A','B','C','D','E','F')
Level1 <- c(20,50,30,10,15,10,NA)
Level2 <- c(40,33,84,NA,20,1,NA)
Level3 <- c(60,40,60,10,25,NA,NA)
Grade1 <- c(20,50,30,10,15,10,NA)
Grade2 <- c(40,33,84,NA,20,1,NA)

DF <- data.frame(ID,Level1,Level2,Level3,Grade1,Grade2)
  ID Level1 Level2 Level3 Grade1 Grade2
1  A     20     40     60     20     40
2  A     50     33     40     50     33
3  B     30     84     60     30     84
4  C     10     NA     10     10     NA
5  D     15     20     25     15     20
6  E     10      1     NA     10      1
7  F     NA     NA     NA     NA     NA

我的目标是按ID对数据进行分组,并通过计算平均值来汇总列名称为包含字符串“ Level”的列。理想情况下,输出应如下所示

ID        mean (Level1+Level2+Level3)
A         40.5
B         58
C         10
....

这是我的代码

DF %>%
  group_by(ID) %>%
  select(starts_with('Level')) %>%
  summarise(mean(.,na.rm = TRUE))

运行代码时,得到以下输出

Adding missing grouping variables: `ID`
# A tibble: 6 x 2
  ID    `mean(., na.rm = TRUE)`
  <fct>                   <dbl>
1 A                          NA
2 B                          NA
3 C                          NA
4 D                          NA
5 E                          NA
6 F                          NA
Warning messages:
1: In mean.default(., na.rm = TRUE) :
  argument is not numeric or logical: returning NA
2: In mean.default(., na.rm = TRUE) :
  argument is not numeric or logical: returning NA
3: In mean.default(., na.rm = TRUE) :
  argument is not numeric or logical: returning NA
4: In mean.default(., na.rm = TRUE) :
  argument is not numeric or logical: returning NA
5: In mean.default(., na.rm = TRUE) :
  argument is not numeric or logical: returning NA
6: In mean.default(., na.rm = TRUE) :
  argument is not numeric or logical: returning NA

云,请帮助我理解我的代码有什么问题。对于建议的解决方案1)应该使用dplyr中的starts_with()或contains()之类的函数,通过将列名称与字符串匹配来选择列。 2)如果可能的话,我也想避免旋转或收集功能。

感谢您的帮助

2 个答案:

答案 0 :(得分:0)

DF %>%
  group_by(ID) %>%
  select(starts_with('Level')) %>%
  summarise_all(funs(mean(.,na.rm = TRUE)))

DF %>%
  group_by(ID) %>%
  select(starts_with('Level')) %>%
  summarise_all(list(~mean(.,na.rm = TRUE)))

你可以得到这个:

  ID    Level1 Level2 Level3
  <fct>  <dbl>  <dbl>  <dbl>
1 A         35   36.5     50
2 B         30   84       60
3 C         10  NaN       10
4 D         15   20       25
5 E         10    1      NaN
6 F        NaN  NaN      NaN

答案 1 :(得分:0)

Eidt:更新了在“级别”列中汇总的答案。

DF %>%
  gather(col, value, -ID) %>%
  filter(col %>% str_starts("Level")) %>%
  group_by(ID) %>%
  summarise(mean = mean(value, na.rm = TRUE))

## A tibble: 6 x 2
#  ID     mean
#  <fct> <dbl>
#1 A      40.5
#2 B      58  
#3 C      10  
#4 D      20  
#5 E       5.5
#6 F     NaN  

原始答案 这是Sang赢得Kim撰写的内容的一种变体,该变体适用于当前的CRAN版本dplyr 0.8.3。

DF %>%
  group_by(ID)  %>%
  summarise_at(vars(starts_with('Level')), mean, na.rm = TRUE)

# A tibble: 6 x 4
  ID    Level1 Level2 Level3
  <fct>  <dbl>  <dbl>  <dbl>
1 A         35   36.5     50
2 B         30   84       60
3 C         10  NaN       10
4 D         15   20       25
5 E         10    1      NaN
6 F        NaN  NaN      NaN
相关问题