在dplyr case_when()中使用动态变量名称

时间:2020-08-04 10:27:54

标签: r dplyr

我在R中有一个使用case_when的函数:

myfunction <- function(df, col, case_name, cntl_name) {

object <- df %>%
    mutate(
        class = case_when(
            col == case_name ~ 1,
            col == cntl_name ~ 0,
         )
     )
return(object)
}

所以,如果我有这个对象:

df <- structure(list(id = c("ID1", "ID2", 
"ID3", "ID4", "ID5"
), phenotype = c("blue", "blue", "red", 
"green", "red"), treatment = c("treat1", "treat2", 
"none", "none", "none"), weeks_of_treatment = c(0, 0, 0, 0, 0
)), row.names = c("ID1", "ID2", 
"ID3", "ID4", "ID5"
), class = "data.frame")

> df
     id phenotype treatment weeks_of_treatment
ID1 ID1      blue    treat1                  0
ID2 ID2      blue    treat2                  0
ID3 ID3       red      none                  0
ID4 ID4     green      none                  0
ID5 ID5       red      none                  0

然后运行:

newdf <- myfunction(df, "phenotype", "red", "blue")

它应该返回如下所示的数据框:

   id phenotype treatment weeks_of_treatment class
1 ID1      blue    treat1                  0     0
2 ID2      blue    treat2                  0     0
3 ID3       red      none                  0     1
4 ID4     green      none                  0    NA
5 ID5       red      none                  0     1

但是没有-它返回以下内容:

> newdf
   id phenotype treatment weeks_of_treatment class
1 ID1      blue    treat1                  0    NA
2 ID2      blue    treat2                  0    NA
3 ID3       red      none                  0    NA
4 ID4     green      none                  0    NA
5 ID5       red      none                  0    NA

它无法将变量col识别为列phenotype。有人知道如何在case_when中输入动态变量吗?

我已经尝试了dplyr中变量的其他解决方案(例如,在col [[col]]上使用双括号),但找不到有效的方法。

1 个答案:

答案 0 :(得分:1)

myfunction <- function(df, col, case_name, cntl_name) {
  object <- df %>%
    mutate(
      class = case_when(
        {{col}} == case_name ~ 1,
        {{col}} == cntl_name ~ 0,
      )
    )
  return(object)
}

myfunction(df, phenotype, "red", "blue")
   id phenotype treatment weeks_of_treatment class
1 ID1      blue    treat1                  0     0
2 ID2      blue    treat2                  0     0
3 ID3       red      none                  0     1
4 ID4     green      none                  0    NA
5 ID5       red      none                  0     1

我个人更喜欢

myfunction <- function(df, col, case_name, cntl_name) {
  qCol <- enquo(col)
  object <- df %>%
    mutate(
      class = case_when(
        !! qCol == case_name ~ 1,
        !! qCol == cntl_name ~ 0,
      )
    )
  return(object)
}

因为它使环境变量和数据帧变量之间的分隔变得明确。

与NSE合作时,我评论中的链接是我的转到页面。