有条件地创建一个新变量

时间:2019-02-17 00:03:50

标签: r dplyr

我有以下data.frame。我需要按如下方式创建第六个变量(SAT_NEWS):如果在四个变量($ medwell_。)中的三个中回答者回答“很好”或“很好”,则新变量的值为SAT,否则为SAT。是NON_SAT。

'data.frame':   41953 obs. of  5 variables:
 $ trust_gov       : Factor w/ 6 levels "A lot","Somewhat",..: 1 2 2 2 1 2 4 2 2 2 ...
 $ medwell_accuracy: Factor w/ 7 levels "Very well","Somewhat well",..: 2 4 2 3 4 2 1 1 1 1 ...
 $ medwell_leaders : Factor w/ 7 levels "Very well","Somewhat well",..: 2 3 2 4 4 3 1 2 1 1 ...
 $ medwell_unbiased: Factor w/ 7 levels "Very well","Somewhat well",..: 4 4 2 4 3 2 1 2 1 3 ...
 $ medwell_coverage: Factor w/ 7 levels "Very well","Somewhat well",..: 2 4 1 3 3 2 1 1 2 3 ...
 - attr(*, "variable.labels")= Named chr  "ID. Respondent ID" "Survey" "Country" "QSPLIT. Split form A or B" ...
  ..- attr(*, "names")= chr  "ID" "survey" "Country" "qsplit" ...
 - attr(*, "codepage")= int 65001

你能帮我吗?

1 个答案:

答案 0 :(得分:1)

不幸的是,没有用于数据帧的%in%方法,因此需要一些额外的工作。对于基数R,我们可以使用

nm <- grep("medwell_", names(df))
num <- colSums(apply(df[, nm], 1, `%in%`, c("Very well", "Somewhat well")))
df$new <- ifelse(num == 3, "SAT", "NON_SAT")

在使用dplyr的同时,

df %>% 
  mutate(
    new = ifelse(
      select(., contains("medwell_")) %>% 
        map2_dfr(list(c("Very well", "Somewhat well")), `%in%`) %>%
        rowSums() == 3, "SAT", "NON_SAT"
    )
  )