在'R'中实现代码?

时间:2014-02-09 10:05:33

标签: r

假设我有以下数据集。

Index-----Country------Age------Time-------Response
---------------------------------------------------
1------------------Germany-----------20-30----------15-20------------------1

2------------------Germany-----------20-30----------15-20------------------NA

3------------------Germany-----------20-30----------15-20------------------1

4------------------Germany-----------20-30----------15-20------------------0

5------------------France--------------20-30----------30-40------------------1

我想根据下面列出的标准填写NA

  1. 查找国家,年龄和时间的所有完全匹配项。即。指数1,3和4
  2. 从这些匹配的“响应”列中随机选择1个值 行。即1,1或0
  3. 用此新值替换NA
  4. 我希望它能以相同的方式继续使用数据集中的其余NA。

    我是'R'的新手,无法弄清楚如何编码。

1 个答案:

答案 0 :(得分:2)

以下是使用“data.table”包的一种方法:

DT <- data.table(mydf, key = "Country,Age,Time")
DT[, R2 := ifelse(is.na(Response), sample(na.omit(Response), 1), 
                  Response), by = key(DT)]
DT
#    Index Country   Age  Time Response R2
# 1:     5  France 20-30 30-40        1  1
# 2:     6  France 20-30 30-40       NA  2
# 3:     7  France 20-30 30-40        2  2
# 4:     1 Germany 20-30 15-20        1  1
# 5:     2 Germany 20-30 15-20       NA  1
# 6:     3 Germany 20-30 15-20        1  1
# 7:     4 Germany 20-30 15-20        0  0

同样,在基数R中,您可以尝试ave

within(mydf, {
  R2 <- ave(Response, Country, Age, Time, FUN = function(x) {
    ifelse(is.na(x), sample(na.omit(x), 1), x)
  })
})

抱歉,忘记分享我正在使用的示例数据:

mydf <- structure(list(Index = 1:7, Country = c("Germany", "Germany", 
"Germany", "Germany", "France", "France", "France"), Age = c("20-30", 
"20-30", "20-30", "20-30", "20-30", "20-30", "20-30"), Time = c("15-20", 
"15-20", "15-20", "15-20", "30-40", "30-40", "30-40"), Response = c(1L, 
NA, 1L, 0L, 1L, NA, 2L)), .Names = c("Index", "Country", "Age", 
"Time", "Response"), class = "data.frame", row.names = c(NA, -7L))