在R中使用Map()和match()跳过元素

时间:2017-10-20 16:45:06

标签: r

我想使用df1数据框重新编码df2数据框中的值,以便最终得到像df3这样的数据框。

目前的代码几乎可以解决问题,但有两个问题。首先,它会在没有匹配时引入NA,例如df2 df1变量值aed_bloodpr"1,2"不匹配,因此值变为NA。其次,当df1中的变量无法映射到df2时,代码将无法运行(错误消息)。

已查看nomatch的{​​{1}}参数和match()的.default参数,但我无法弄清楚如何使用它们以便我最终得到Map() 1}}。

起点:

df3

结束点:

Df1 <- data.frame("aed_bloodpr" = c("1,2","2","1","1"),
                  "aed_gluco" = c("2","1","3","2"),
                  "add_bmi" = c("2","5,7","7","5"),
                  "add_asthma" = c("2","2","7","5"),
                  "nausea" = c("3","3","4","5"))

Df2 <- data.frame("NameOfVariable" = c("aed_bloodpr","aed_bloodpr","aed_gluco","aed_gluco","aed_gluco","add_bmi","add_bmi","add_bmi"),
                  "VariableLevel" = c(1,2,1,2,3,2,5,7),
                  "VariableDef" = c("high","normal","elevated","normal","NA","above","normal","below"))

当前代码:

Df3 <- data.frame("aed_bloodpr" = c("1,2","normal","high","high"),
                  "aed_gluco" = c("normal","elevated","NA","normal"), 
                  "add_bmi" = c("above","5,7","below","normal"), 
                  "add_asthma"=c("2","2","7","5"), 
                  "nausea" = c("3","3","4","5"))

1 个答案:

答案 0 :(得分:1)

您需要清理才能重新标记。通过连接更容易实现实际的重新标记。这里使用tidyverse(你喜欢翻译):

library(tidyverse)

Df1 <- data.frame("aed_bloodpr" = c("1,2","2","1","1"),
                  "aed_gluco" = c("2","1","3","2"),
                  "add_bmi" = c("2","5,7","7","5"),
                  "add_asthma" = c("2","2","7","5"),
                  "nausea" = c("3","3","4","5"))

Df2 <- data.frame("NameOfVariable" = c("aed_bloodpr","aed_bloodpr","aed_gluco","aed_gluco","aed_gluco","add_bmi","add_bmi","add_bmi"),
                  "VariableLevel" = c(1,2,1,2,3,2,5,7),
                  "VariableDef" = c("high","normal","elevated","normal","NA","above","normal","below"))

Df1_long <- Df1 %>% 
    mutate_all(as.character) %>%    # change factors to strings
    rowid_to_column('i') %>%    # add row index to enable later long-to-wide reshape
    gather(variable, value, -i) %>%    # reshape to long form
    separate_rows(value, convert = TRUE)    # unnest nested values and convert to numeric

str(Df1_long)
#> 'data.frame':    22 obs. of  3 variables:
#>  $ i       : int  1 1 2 3 4 1 2 3 4 1 ...
#>  $ variable: chr  "aed_bloodpr" "aed_bloodpr" "aed_bloodpr" "aed_bloodpr" ...
#>  $ value   : int  1 2 2 1 1 2 1 3 2 2 ...

Df2_clean <- Df2 %>% 
    mutate_if(is.factor, as.character) %>%    # change factors to strings
    mutate_all(na_if, 'NA')    # change "NA" to NA

Df3 <- Df1_long %>% 
    left_join(Df2_clean, by = c('variable' = 'NameOfVariable',    # merge
                                'value' = 'VariableLevel')) %>% 
    mutate(VariableDef = coalesce(VariableDef, as.character(value))) %>%    # combine labels and values
    group_by(i, variable) %>% 
    summarise(value = toString(VariableDef)) %>%    # re-aggregate multiple values
    spread(variable, value)    # reshape to wide form

Df3
#> # A tibble: 4 x 6
#> # Groups:   i [4]
#>       i add_asthma       add_bmi  aed_bloodpr aed_gluco nausea
#> * <int>      <chr>         <chr>        <chr>     <chr>  <chr>
#> 1     1          2         above high, normal    normal      3
#> 2     2          2 normal, below       normal  elevated      3
#> 3     3          7         below         high         3      4
#> 4     4          5        normal         high    normal      5