我想对嵌套变量中的某些列应用矢量化操作。我想要应用的功能是在数字要素中找到缺失值的总和,即weight
和calories
。我拥有的数据框如下:
df <- data.frame(country = c("US", "US", "UK", "PAK"),name = c("David",
"James", "Junaid", "Ali"), fruit = c("Apple", "banana", "orange", "melon"),
weight = c(90,110,120,NA), calories = c(NA,20, NA,NA))
country name fruit weight calories
1 US David Apple 90 NA
2 US James banana 110 20
3 UK Junaid orange 120 NA
4 PAK Ali melon NA NA
当我嵌套数据框
时nested_df <- df %>% group_by(country) %>% nest()
# A tibble: 3 × 2
country data
<fctr> <list>
1 US <tibble [2 × 4]>
2 UK <tibble [1 × 4]>
3 PAK <tibble [1 × 4]>
我尝试使用以下语法,但无济于事。
nested_df %>% mutate(missings = map(data, c("weight", "calories")) %>%
map_lgl(function(x) sum(!is.na(x))/length(x) ==1))`
我期望的结果如下
`# A tibble: 3 × 3
country data missings
<fctr> <list> <lgl>
1 US <tibble [2 × 4]> FALSE
2 UK <tibble [1 × 4]> FALSE
3 PAK <tibble [1 × 4]> TRUE`
然而,我得到的是
` A tibble: 3 × 3
country data missings
<fctr> <list> <lgl>
1 US <tibble [2 × 4]> NA
2 UK <tibble [1 × 4]> NA
3 PAK <tibble [1 × 4]> NA`
答案 0 :(得分:0)
这将检查超过50%的值是否为NA
...
colstocheck <- c("weight", "calories")
nested_df %>% mutate(missings = (map_lgl(data,
function(x) sum(is.na(x[,colstocheck]))/length(x[,colstocheck]) > 0.5)))
# A tibble: 3 x 3
country data missings
<fctr> <list> <lgl>
1 US <tibble [2 x 4]> FALSE
2 UK <tibble [1 x 4]> FALSE
3 PAK <tibble [1 x 4]> TRUE