使用case_when创建组标识变量

时间:2019-08-31 01:25:59

标签: r dplyr data-manipulation case-when

我有一个较大家庭成员的数据集,因此有一个变量来标识受访者与受访家庭成员(父母,子女,兄弟等)的关系。

我想创建一个标识其“世代组”的变量。 我的小组是:

join()

我尝试使用case_when通过以下代码创建新的“生成”变量:

def get_info(data, *, client_id=None, session_key=None):
    if client_id is not None:
        d = next((x for x in data if x['client'] == client_id), None)
    elif session_key is not None:
        d = next((x for x in data if session_key in x['keys']), None)
    else:
        raise ValueError('No selector provided')

    if d is None:
        raise ValueError('Could not find')

    return d['server_name'], d['db_name']

但是新变量“ 2017_generation”仍完全用NA值填充。知道我在做什么错吗? (下面的示例数据)

gen0 <- c("grandparent", "grandparent_ofwife")
gen1 <- c("parent", "parent_inlaw", "parent_ofcohab")
gen2 <- c("head", "wife_legal", "wife_cohabit", "husband_legal", "y1_cohab")
gen3 <- c("child", "child_step", "child_ofwife", "child_inlaw", "child_foster", "child_1y_cohab")

1 个答案:

答案 0 :(得分:1)

这有效。我认为主要问题是变量名周围的引号。但是,列的名称也不能以数字开头。

gen1 <- c("parent", "parent_inlaw", "parent_ofcohab")
gen2 <- c("head", "wife_legal", "wife_cohabit", "husband_legal", "y1_cohab")
gen3 <- c("child", "child_step", "child_ofwife", "child_inlaw", "child_foster", "child_1y_cohab")
library(dplyr)
dat <- data.frame("x2017_relation_head" = sample(c(gen0, gen1, gen2, gen3),
                                                size = 100, replace = TRUE))
dat$x2017_relation_head <- as.character(dat$x2017_relation_head)
dat2<- dat %>% mutate(genx = 
          case_when(x2017_relation_head %in% gen0 ~ "gen0",
            x2017_relation_head %in% gen1 ~ "gen1",
            x2017_relation_head %in% gen2 ~ "gen2",
            x2017_relation_head %in% gen3 ~ "gen3"))
head(dat2)
  x2017_relation_head genx
1      child_1y_cohab gen3
2         child_inlaw gen3
3          child_step gen3
4       husband_legal gen2
5          child_step gen3
6         child_inlaw gen3