基于dplyr中的聚结结果的条件变异

时间:2017-09-12 16:05:23

标签: r if-statement dplyr coalesce case-when

我从头开始自学R,基本上是通过做某事,然后阅读这些帖子,以及基于此的反复试验。有时候我会撞到墙上伸手去拿。

我撞墙了。我安装了dplyr 0.7。我有一个列的字节 - 称之为contract_key - 我通过将mutate(coalesce())应用于tibble中的其他三个列来添加。以下是示例数据:

product <- c("655393265191","655393265191","168145850127","168145850127","350468621217","350468621217","977939797847","NA","928893912852")
supplier <- c("person5","person3","person10","person5","person11","person5","person11","person14","person5")
vendor <- c("org2","org3","org3","org2","org1","org2","org1","org5","org2")
quantity <- c(7,5,6,1,2,1,18,2,2)
gross <- c(0.0419,0.0193,0.0439,0.0069,0.0027,0.0055,0.0233,NA,0.0004)

df <- data_frame(product,supplier,vendor,quantity,gross)

以下是我生成contract_key

的方式
df <- df %>% 
  mutate(contract_key = coalesce(product,supplier,vendor))

我现在想要添加另一个列,根据提供内容的三列中的哪一列(通过coalesce())对contract_key的内容进行分类。因此,如果contract_key =&#34; person5&#34;,例如,新列contract_level将是&#34; supplier&#34;。而contract_key =&#34; org2&#34;将映射到contract_level =&#34;供应商&#34;等等。

基本上,我将contract_level用作另一个组合的连接变量。

我很难过。我已经尝试了if_else,我发现我不应该费心case_when(因为它在mutate()中)。我也尝试过嵌套if_else无济于事。

它可能是我不知道的基本R语法。与点符号和语法有关。如果有人提供答案,我会回溯直到我弄清楚你做了什么。 (而且我已经在R中学到了新的一课!)

谢谢!

1 个答案:

答案 0 :(得分:2)

这个怎么样:

df %>% mutate(contract_key = coalesce(product,supplier,vendor),
              contract_level = case_when(contract_key %in% product ~ "product",
                                         contract_key %in% supplier ~ "supplier",
                                         contract_key %in% vendor ~ "vendor",
                                         TRUE ~ "none"))
       product supplier vendor quantity  gross contract_key contract_level
1 655393265191  person5   org2        7 0.0419 655393265191        product
2 655393265191  person3   org3        5 0.0193 655393265191        product
3 168145850127 person10   org3        6 0.0439 168145850127        product
4 168145850127  person5   org2        1 0.0069 168145850127        product
5 350468621217 person11   org1        2 0.0027 350468621217        product
6 350468621217  person5   org2        1 0.0055 350468621217        product
7 977939797847 person11   org1       18 0.0233 977939797847        product
8         <NA> person14   org5        2     NA     person14       supplier
9 928893912852  person5   org2        2 0.0004 928893912852        product

需要较少代码的其他选项:

df %>% mutate(contract_key = coalesce(product,supplier,vendor),
              contract_level = if_else(!is.na(product), 'product', 
                                       if_else(!is.na(supplier), 'supplier', 'vendor')))

df %>% mutate(contract_key = coalesce(product,supplier,vendor),
              contract_level = apply(., 1, function(x) names(.)[min(which(!is.na(x)))]))