我对R还是比较陌生,解决这个问题的方法可能很简单。
让我们想象一下,我有两个鸟(a和b)的鸟巢数据集,如下所示:
df
year nestid sp egg chick
2013 a1 a 2 1
2013 a2 a NA 1
2013 a3 a NA 0
2013 a4 a NA 1
2013 a5 a NA 0
2013 b1 b 2 0
2013 b2 b NA 1
2013 b3 b NA 2
2013 b4 b NA 1
2014 a1 a NA 1
2014 a2 a NA 1
2014 a3 a 1 1
2014 a4 a NA 1
2014 a5 a NA 1
2014 b1 b NA 1
2014 b2 b NA 2
2014 b3 b NA 2
2014 b4 b NA 1
我想从雏鸡数量中推断出那些“ NA”的卵子数量。如果有“ 2”只雏鸡,则最多可产2个卵,因此用“ 2”代替“ NA”是有意义的。
但是我想在2013年将随机选择的80%的鸟巢用1个雏鸟的NA替换为“ 2”,对于剩余的20%的“ a”种鸟用1个雏鸟的NAs替换。对于2014年的“ a”物种,离合器尺寸分别为2和1的比率分别为40%和60%。
我尝试过这种方法,但无法弄清楚如何正确编码。
df%>% mutate(egg=ifelse(egg==0 & chick==2, 2, egg))
df%>%
mutate(egg=ifelse(egg==0 & chick==1 & year==2013, sample_frac(.8)==2, egg))
任何帮助将不胜感激!
非常感谢
答案 0 :(得分:0)
其中一种方法可能是
set.seed(123)
#missing egg & chick = 2
df$egg <- with(df,ifelse(is.na(egg) & chick == 2, 2, egg))
#2013 data having species = 'a', missing egg & chick = 1
x <- with(df, which(is.na(egg) & chick == 1 & sp == 'a' & year == 2013))
x_sample <- sample(x, round(0.8 * length(x)))
df$egg[x_sample] <- 2
df$egg[setdiff(x, x_sample)] <- 1
#2014 data having species = 'a', missing egg & chick = 1
x <- with(df, which(is.na(egg) & chick == 1 & sp == 'a' & year == 2014))
x_sample <- sample(x, round(0.4 * length(x)))
df$egg[x_sample] <- 2
df$egg[setdiff(x, x_sample)] <- 1
给出
> df
year nestid sp egg chick
1 2013 a1 a 2 1
2 2013 a2 a 2 1
3 2013 a3 a NA 0
4 2013 a4 a 2 1
5 2013 a5 a NA 0
6 2013 b1 b 2 0
7 2013 b2 b NA 1
8 2013 b3 b 2 2
9 2013 b4 b NA 1
10 2014 a1 a 1 1
11 2014 a2 a 2 1
12 2014 a3 a 1 1
13 2014 a4 a 2 1
14 2014 a5 a 1 1
15 2014 b1 b NA 1
16 2014 b2 b 2 2
17 2014 b3 b 2 2
18 2014 b4 b NA 1
示例数据
df <- structure(list(year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L), nestid = c("a1", "a2", "a3", "a4", "a5",
"b1", "b2", "b3", "b4", "a1", "a2", "a3", "a4", "a5", "b1", "b2",
"b3", "b4"), sp = c("a", "a", "a", "a", "a", "b", "b", "b", "b",
"a", "a", "a", "a", "a", "b", "b", "b", "b"), egg = c(2L, NA,
NA, NA, NA, 2L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA
), chick = c(1L, 1L, 0L, 1L, 0L, 0L, 1L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 1L)), .Names = c("year", "nestid", "sp",
"egg", "chick"), class = "data.frame", row.names = c(NA, -18L
))