假设我有以下数据集。
Index-----Country------Age------Time-------Response
---------------------------------------------------
1------------------Germany-----------20-30----------15-20------------------1
2------------------Germany-----------20-30----------15-20------------------NA
3------------------Germany-----------20-30----------15-20------------------1
4------------------Germany-----------20-30----------15-20------------------0
5------------------France--------------20-30----------30-40------------------1
我想根据下面列出的标准填写NA
我希望它能以相同的方式继续使用数据集中的其余NA。
我是'R'的新手,无法弄清楚如何编码。
答案 0 :(得分:2)
以下是使用“data.table”包的一种方法:
DT <- data.table(mydf, key = "Country,Age,Time")
DT[, R2 := ifelse(is.na(Response), sample(na.omit(Response), 1),
Response), by = key(DT)]
DT
# Index Country Age Time Response R2
# 1: 5 France 20-30 30-40 1 1
# 2: 6 France 20-30 30-40 NA 2
# 3: 7 France 20-30 30-40 2 2
# 4: 1 Germany 20-30 15-20 1 1
# 5: 2 Germany 20-30 15-20 NA 1
# 6: 3 Germany 20-30 15-20 1 1
# 7: 4 Germany 20-30 15-20 0 0
同样,在基数R中,您可以尝试ave
:
within(mydf, {
R2 <- ave(Response, Country, Age, Time, FUN = function(x) {
ifelse(is.na(x), sample(na.omit(x), 1), x)
})
})
抱歉,忘记分享我正在使用的示例数据:
mydf <- structure(list(Index = 1:7, Country = c("Germany", "Germany",
"Germany", "Germany", "France", "France", "France"), Age = c("20-30",
"20-30", "20-30", "20-30", "20-30", "20-30", "20-30"), Time = c("15-20",
"15-20", "15-20", "15-20", "30-40", "30-40", "30-40"), Response = c(1L,
NA, 1L, 0L, 1L, NA, 2L)), .Names = c("Index", "Country", "Age",
"Time", "Response"), class = "data.frame", row.names = c(NA, -7L))