我想通过对同一组的非NA值进行采样来填补每个组的NA值。
这是最接近我想要使用!is.na()
Ignoring values or NAs in the sample function实现的目标。
> dput(data)
structure(list(len = c(NA, 45447.4157838775, 161037.71538108,
78147.8550470324, 7193.48815617057, 1571.95459212405, 18191.381972185,
20366.2132412031, 10014.987524596, 1403.72511829297, 5651.17842991513,
6848.03271105711, 8043.32937011393, 8926.65133418451, 5808.44456603825,
2208.14264175252, 1797.4936747033, 5325.76651327694, 2660.66730207955,
5844.07912541444, 3956.40473896271, 959.873314407621, 3294.01472360025,
5221.94864001864, 3781.51913857335, 7811.83819953768, 3387.20323328623,
5514.92099458441, 5792.54371531706, 5643.98385143961, 15478.916809379,
8401.66533205217, 7046.25074819247, 2734.73639821402, NA, 62332.3343404513,
NA, 46563.1214718113, 25590.4020105238, 13015.3682275862, 4984.80432801441,
NA), point = c(NA, 0, 8, 5, 2, 0, 9, 0, 0, 0, 3, 1, 0, 6, 1,
1, 0, 0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, NA,
10, NA, 19, 6, 5, 0, NA), country = structure(c(1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L,
3L, 2L, 2L, 2L, 2L, 1L), .Label = c("WCY_____ES", "WCY_____FR",
"WCY_____IT"), class = "factor"), group = c(1L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L)), row.names = c(NA, -42L), class = "data.frame")
library(dplyr)
data1 <- data %>%
group_by(group) %>%
mutate(nulen = if_else(country == 'WCY_____FR', len, sample(len[!is.na(len)], 1, TRUE)),
nupoint = if_else(country == 'WCY_____FR', point, sample(point[!is.na(point)], 1, TRUE)))
但是我得到Error in sample.int(length(x), size, replace, prob) :
invalid first argument
已知分布和缺口填充之间应该没有显着差异。如果没有可从同一组中采样的值(其他值为NA
或“ group”中只有一行),则应从整个数据集中获取采样。任何包装都可以。
答案 0 :(得分:0)
这是个主意,
dd %>%
mutate(len1 = replace(len, is.na(len), sample(len[!is.na(len)], 1, TRUE)),
point1 = replace(point, is.na(point), sample(point[!is.na(point)], 1, TRUE))) %>%
group_by(group) %>%
mutate(nulen = ifelse(all(is.na(len)) & country == 'WCY_____FR', len1,
ifelse(is.na(len) & country == 'WCY_____FR', sample(len[!is.na(len)], 1, TRUE), len)))
给出,
len point country group len1 point1 nulen <dbl> <dbl> <fct> <int> <dbl> <dbl> <dbl> 1 NA NA WCY_____ES 1 1572. 0 NA 2 45447. 0 WCY_____FR 2 45447. 0 45447. 3 161038. 8 WCY_____FR 2 161038. 8 161038. 4 78148. 5 WCY_____FR 2 78148. 5 78148. 5 7193. 2 WCY_____FR 3 7193. 2 7193. 6 1572. 0 WCY_____FR 3 1572. 0 1572. 7 18191. 9 WCY_____FR 3 18191. 9 18191. 8 20366. 0 WCY_____FR 3 20366. 0 20366. 9 10015. 0 WCY_____FR 3 10015. 0 10015. 10 1404. 0 WCY_____FR 3 1404. 0 1404. # ... with 32 more rows
point
也可以这样做。