在r中的嵌套变量中的某些列上应用函数

时间:2017-10-06 12:31:18

标签: r purrr

我想对嵌套变量中的某些列应用矢量化操作。我想要应用的功能是在数字要素中找到缺失值的总和,即weightcalories。我拥有的数据框如下:

df <- data.frame(country = c("US", "US", "UK", "PAK"),name = c("David", 
"James", "Junaid", "Ali"), fruit = c("Apple", "banana", "orange", "melon"), 
weight = c(90,110,120,NA), calories = c(NA,20, NA,NA))

  country   name  fruit weight calories
1      US  David  Apple     90       NA
2      US  James banana    110       20
3      UK Junaid orange    120       NA
4     PAK    Ali  melon     NA       NA

当我嵌套数据框

nested_df <- df %>% group_by(country) %>% nest()


# A tibble: 3 × 2
  country             data
   <fctr>           <list>
1      US <tibble [2 × 4]>
2      UK <tibble [1 × 4]>
3     PAK <tibble [1 × 4]>

我尝试使用以下语法,但无济于事。

nested_df %>% mutate(missings = map(data, c("weight", "calories")) %>% 
                             map_lgl(function(x) sum(!is.na(x))/length(x) ==1))`

我期望的结果如下

`# A tibble: 3 × 3
  country             data missings
   <fctr>           <list>    <lgl>
1      US <tibble [2 × 4]>       FALSE
2      UK <tibble [1 × 4]>       FALSE
3     PAK <tibble [1 × 4]>       TRUE` 

然而,我得到的是

` A tibble: 3 × 3
  country             data missings
   <fctr>           <list>    <lgl>
1      US <tibble [2 × 4]>       NA
2      UK <tibble [1 × 4]>       NA
3     PAK <tibble [1 × 4]>       NA`

1 个答案:

答案 0 :(得分:0)

这将检查超过50%的值是否为NA ...

colstocheck <- c("weight", "calories")
nested_df %>% mutate(missings = (map_lgl(data, 
                function(x) sum(is.na(x[,colstocheck]))/length(x[,colstocheck]) > 0.5)))

# A tibble: 3 x 3
  country             data missings
   <fctr>           <list>    <lgl>
1      US <tibble [2 x 4]>    FALSE
2      UK <tibble [1 x 4]>    FALSE
3     PAK <tibble [1 x 4]>     TRUE