我试图寻找类似的问题,但找不到类似的东西。
我有一个包含数百行和几个变量的数据框df。第一个变量是 level ,范围是1到8。
例如:
df<- data.frame(level = c(1,1,1,2,2,3,4,5,6,7,8), CODE = c("1234", "3452", "1234", "7654","6547","6546", "7683", "6543", "7683", "6543", "7683"), ADD_ALLOW_MEAL = c(NA, "Y", "Y", "N", "N", NA, NA, "Y", "Y", "N", 'N'), ALLOW_MEALLOW = c(NA, 40, 60, NA, NA, NA, NA, 50, 70, NA, NA))
> df
level CODE ADD_ALLOW_MEAL ALLOW_MEALLOW
1 1 1234 <NA> NA
2 1 3452 Y 40
3 1 1234 Y 60
4 2 7654 N NA
5 2 6547 N NA
6 3 6546 <NA> NA
7 4 7683 <NA> NA
8 5 6543 Y 50
9 6 7683 Y 70
10 7 6543 N NA
11 8 7683 N NA
我需要创建一个新数据框,该数据框将只有8行(df中有8级)。通常我会使用简单的方法:
df %>%
group_by(level) %>%
summarise()
问题是我需要在过滤后的数据上创建几个非常自定义的列,每个列均按级进行。
示例:
df %>%
group_by(level) %>%
summarise(
Meal_Average = filter(., ADD_ALLOW_MEAL =="Y" & ALLOW_MEALLOW>0) %>% {ifelse(str_detect(.$CODE, "2")=="TRUE", round(mean(.$ALLOW_MEALLOW, na.rm = TRUE),3), NA_real_ )}
)
我收到以下错误:
Column `Meal_Average` must be length 1 (a summary value), not 4
我想要的结果是:
level Meal_Average
1 1 50
2 2 NA
3 3 NA
4 4 NA
5 5 NA
6 6 NA
7 7 NA
8 8 NA
任何想法我该怎么做?
谢谢!
答案 0 :(得分:0)
这里是使用dplyr
的想法。我在这里所做的是,我只是根据您的条件将ALLOW_MEALLOW
值替换为0(这样就不会影响均值),即
library(dplyr)
df %>%
mutate(ALLOW_MEALLOW = replace(ALLOW_MEALLOW, ADD_ALLOW_MEAL == 'N' & ALLOW_MEALLOW < 0 | !grepl('2', CODE), 0)) %>%
group_by(level) %>%
summarise(new_mean = mean(ALLOW_MEALLOW, na.rm = TRUE))
给出,
# A tibble: 8 x 2 level new_mean <dbl> <dbl> 1 1 50 2 2 0 3 3 0 4 4 0 5 5 0 6 6 0 7 7 0 8 8 0
注意:您可以像往常一样将0替换为NA