data.table无法识别列的特定数值,但正在识别所有其他值

时间:2017-06-27 18:34:01

标签: r data.table

我是R中data.table的新手,遇到了一个无法识别名为“coverage”的列的某些值的问题。我创建了数据表如下:

dt <- as.data.table(expand.grid(coverage = c(seq(0, 0.9, 0.1), 0.99),
                            year     = seq(0, 15, 1)),
                cum_inf = numeric())

然后我想通过读入.RData文件并从中提取相应的信息来填写cum_inf列:

for(i in 1:length(files)) {
  load(files[i])
  model <- eval(parse(text = file_names[i]))
  cov   <- (model$param$perc_vaccinated*3*365)/(1 + model$param$perc_vaccinated*3*365)
  for(j in 0:15) {
    dt[coverage == cov & year == j, cum_inf := mean(sapply(model$popsumm[[1]], function(x) {
      if(j == 0) { 0 } else {
        sum(x[1]:x[(365/5)*j])
      }
    }))]
  }
  rm(list=ls(pattern="sens"))
}

但是,无法识别0.3,0.6和0.7的覆盖率值,因此未填写cum_inf的相应值。例如,如果我键入dt[coverage == 0.2],则R将打印到控制台:

  coverage year cum_inf
 1:      0.2    0    0.00
 2:      0.2    1   16.05
 3:      0.2    2   20.40
 4:      0.2    3   11.50
 5:      0.2    4   17.45
 6:      0.2    5   11.25
 7:      0.2    6   14.70
 8:      0.2    7   10.90
 9:      0.2    8    8.35
10:      0.2    9    7.50
11:      0.2   10    5.90
12:      0.2   11    3.60
13:      0.2   12    4.50
14:      0.2   13    3.05
15:      0.2   14    4.70
16:      0.2   15    3.35

但是,dt[coverage == 0.3]会返回Empty data.table (0 rows) of 3 cols: coverage,year,cum_inf。我知道数据表的第四行的覆盖率值为0.3,所以我尝试dt[4,]来查看为0.3的覆盖率存储的值,它看起来像0.3:

   coverage year cum_inf
1:      0.3    0      NA

同样,dt[coverage == dt[4, coverage]]打印到控制台:

    coverage year cum_inf
 1:      0.3    0      NA
 2:      0.3    1      NA
 3:      0.3    2      NA
 4:      0.3    3      NA
 5:      0.3    4      NA
 6:      0.3    5      NA
 7:      0.3    6      NA
 8:      0.3    7      NA
 9:      0.3    8      NA
10:      0.3    9      NA
11:      0.3   10      NA
12:      0.3   11      NA
13:      0.3   12      NA
14:      0.3   13      NA
15:      0.3   14      NA
16:      0.3   15      NA

非常感谢您理解为什么无法以与其他值相同的方式识别coverage列中的这三个值的原因。

1 个答案:

答案 0 :(得分:1)

在第20位左右出现错误:

print(dt$coverage,digits=20)
  [1] 0.00000000000000000 0.10000000000000001 0.20000000000000001 0.30000000000000004 0.40000000000000002 0.50000000000000000 0.60000000000000009
  [8] 0.70000000000000007 0.80000000000000004 0.90000000000000002 1.00000000000000000 0.00000000000000000 0.10000000000000001 0.20000000000000001

围绕覆盖范围生成声明:

dt <- as.data.table(expand.grid(coverage = round(c(seq(0, 0.9, 0.1), .99),2),
                                year     = seq(0, 15, 1)),
                    cum_inf = numeric())

>dt[coverage==.3]

    coverage year
 1:      0.3    0
 2:      0.3    1
 3:      0.3    2
 4:      0.3    3
 5:      0.3    4
 6:      0.3    5
 7:      0.3    6
 8:      0.3    7
 9:      0.3    8
10:      0.3    9
11:      0.3   10
12:      0.3   11
13:      0.3   12
14:      0.3   13
15:      0.3   14
16:      0.3   15