我有以下数据框
library(tidyverse)
ID <- c('A','A','B','C','D','E','F')
Level1 <- c(20,50,30,10,15,10,NA)
Level2 <- c(40,33,84,NA,20,1,NA)
Level3 <- c(60,40,60,10,25,NA,NA)
Grade1 <- c(20,50,30,10,15,10,NA)
Grade2 <- c(40,33,84,NA,20,1,NA)
DF <- data.frame(ID,Level1,Level2,Level3,Grade1,Grade2)
ID Level1 Level2 Level3 Grade1 Grade2
1 A 20 40 60 20 40
2 A 50 33 40 50 33
3 B 30 84 60 30 84
4 C 10 NA 10 10 NA
5 D 15 20 25 15 20
6 E 10 1 NA 10 1
7 F NA NA NA NA NA
我的目标是按ID对数据进行分组,并通过计算平均值来汇总列名称为包含字符串“ Level”的列。理想情况下,输出应如下所示
ID mean (Level1+Level2+Level3)
A 40.5
B 58
C 10
....
这是我的代码
DF %>%
group_by(ID) %>%
select(starts_with('Level')) %>%
summarise(mean(.,na.rm = TRUE))
运行代码时,得到以下输出
Adding missing grouping variables: `ID`
# A tibble: 6 x 2
ID `mean(., na.rm = TRUE)`
<fct> <dbl>
1 A NA
2 B NA
3 C NA
4 D NA
5 E NA
6 F NA
Warning messages:
1: In mean.default(., na.rm = TRUE) :
argument is not numeric or logical: returning NA
2: In mean.default(., na.rm = TRUE) :
argument is not numeric or logical: returning NA
3: In mean.default(., na.rm = TRUE) :
argument is not numeric or logical: returning NA
4: In mean.default(., na.rm = TRUE) :
argument is not numeric or logical: returning NA
5: In mean.default(., na.rm = TRUE) :
argument is not numeric or logical: returning NA
6: In mean.default(., na.rm = TRUE) :
argument is not numeric or logical: returning NA
云,请帮助我理解我的代码有什么问题。对于建议的解决方案1)应该使用dplyr中的starts_with()或contains()之类的函数,通过将列名称与字符串匹配来选择列。 2)如果可能的话,我也想避免旋转或收集功能。
感谢您的帮助
答案 0 :(得分:0)
DF %>%
group_by(ID) %>%
select(starts_with('Level')) %>%
summarise_all(funs(mean(.,na.rm = TRUE)))
或
DF %>%
group_by(ID) %>%
select(starts_with('Level')) %>%
summarise_all(list(~mean(.,na.rm = TRUE)))
你可以得到这个:
ID Level1 Level2 Level3
<fct> <dbl> <dbl> <dbl>
1 A 35 36.5 50
2 B 30 84 60
3 C 10 NaN 10
4 D 15 20 25
5 E 10 1 NaN
6 F NaN NaN NaN
答案 1 :(得分:0)
Eidt:更新了在“级别”列中汇总的答案。
DF %>%
gather(col, value, -ID) %>%
filter(col %>% str_starts("Level")) %>%
group_by(ID) %>%
summarise(mean = mean(value, na.rm = TRUE))
## A tibble: 6 x 2
# ID mean
# <fct> <dbl>
#1 A 40.5
#2 B 58
#3 C 10
#4 D 20
#5 E 5.5
#6 F NaN
原始答案 这是Sang赢得Kim撰写的内容的一种变体,该变体适用于当前的CRAN版本dplyr 0.8.3。
DF %>%
group_by(ID) %>%
summarise_at(vars(starts_with('Level')), mean, na.rm = TRUE)
# A tibble: 6 x 4
ID Level1 Level2 Level3
<fct> <dbl> <dbl> <dbl>
1 A 35 36.5 50
2 B 30 84 60
3 C 10 NaN 10
4 D 15 20 25
5 E 10 1 NaN
6 F NaN NaN NaN