我有以下data.frame。我需要按如下方式创建第六个变量(SAT_NEWS):如果在四个变量($ medwell_。)中的三个中回答者回答“很好”或“很好”,则新变量的值为SAT,否则为SAT。是NON_SAT。
'data.frame': 41953 obs. of 5 variables:
$ trust_gov : Factor w/ 6 levels "A lot","Somewhat",..: 1 2 2 2 1 2 4 2 2 2 ...
$ medwell_accuracy: Factor w/ 7 levels "Very well","Somewhat well",..: 2 4 2 3 4 2 1 1 1 1 ...
$ medwell_leaders : Factor w/ 7 levels "Very well","Somewhat well",..: 2 3 2 4 4 3 1 2 1 1 ...
$ medwell_unbiased: Factor w/ 7 levels "Very well","Somewhat well",..: 4 4 2 4 3 2 1 2 1 3 ...
$ medwell_coverage: Factor w/ 7 levels "Very well","Somewhat well",..: 2 4 1 3 3 2 1 1 2 3 ...
- attr(*, "variable.labels")= Named chr "ID. Respondent ID" "Survey" "Country" "QSPLIT. Split form A or B" ...
..- attr(*, "names")= chr "ID" "survey" "Country" "qsplit" ...
- attr(*, "codepage")= int 65001
你能帮我吗?
答案 0 :(得分:1)
不幸的是,没有用于数据帧的%in%
方法,因此需要一些额外的工作。对于基数R,我们可以使用
nm <- grep("medwell_", names(df))
num <- colSums(apply(df[, nm], 1, `%in%`, c("Very well", "Somewhat well")))
df$new <- ifelse(num == 3, "SAT", "NON_SAT")
在使用dplyr
的同时,
df %>%
mutate(
new = ifelse(
select(., contains("medwell_")) %>%
map2_dfr(list(c("Very well", "Somewhat well")), `%in%`) %>%
rowSums() == 3, "SAT", "NON_SAT"
)
)