如何使用r中的data.table替换因连接表而导致的NAs

时间:2017-06-13 22:28:13

标签: r join data.table

我有2个数据集:

a <- data.table(Sex = c("male", "female"), Age = sample(c(10,20,30), 100, replace = T), 
                Survived = sample(0:1, 100, replace = T))[, ID := .I]

b <- data.table(Sex = c("male", "female"), Age = sample(c(10,20,40), 100, replace = T), 
                Survived = sample(0:1, 100, replace = T))[, ID := .I]

然后,我创建了第三个数据集道具:

props <- a[, list(.N, percentSurvived = mean(Survived)), keyby = list(Sex, Age)]
props[, Prediction := as.integer(percentSurvived > 0.5 )]

当我加入他们时,我会在某些百分比的行中获得NAsSurvived,因为&#34; b&#34;有一些年龄,#34; a&#34;没有:

b[props, Prediction := i.Prediction, on = c("Sex", "Age")]

我想获取生成的NA,并根据加工变量获得预测值。我试过这个(将NAs分类并仅加入Sex):

b[is.na(Prediction)][props, Prediction := i.Prediction, on = c("Sex" )]

但我还有NAs

      Sex Age Survived  ID Prediction
  1:   male  40        1   1         NA
  2: female  40        1   2         NA
  3:   male  20        0   3          1
  4: female  20        0   4          0

0 个答案:

没有答案