R:missForest缺失值插补奇怪的错误

时间:2019-01-31 13:16:12

标签: r dataframe random-forest missing-data

我正在R中使用各种类型的多重插补程序包测试多个time series data-sets with significant holes (missing values)。我能够使用HmiscMICE成功进行测试。但是,尽管这似乎是三种方法中最简单的一种,但我无法运行missForest方法。

示例: 我有一个data.frame df_final 有2列:

day_of_year (1,2,3,....365 -> 365 integer values, no NA)
bookings  (279 integer values, 86 NA values)

我的目标是用missForest填充86个NA值。

这是我的代码

final.imp <- missForest(df_final, verbose = TRUE)
final.imp$OOBerror
final.imp$error

imputed_df <- final.imp$ximp

这是错误 enter image description here

这怎么可能?我的两列都具有相同的长度= 365。 如果错误是由于NA值引起的,则该算法无法达到其目的。 我一定做错了。

该代码与虹膜数据集完美配合。

编辑:添加dput()的结果

> dput(df_final)
structure(list(day_of_year = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 
59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 
105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 
118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 
131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 
144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 
157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 
170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 
183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 
196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 
209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 
222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 
235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 
248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 
261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 
274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 
287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 
300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 
313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 
326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 
339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 
352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 
365), bookings = c(6L, 12L, 17L, 0L, 2L, NA, 19L, 25L, 28L, 47L, 
43L, 31L, NA, 10L, 32L, 23L, 55L, 39L, 21L, NA, 10L, 23L, 23L, 
56L, 52L, 33L, NA, 19L, 29L, 39L, 69L, 48L, 32L, NA, 21L, 28L, 
49L, 63L, 51L, 27L, NA, 18L, 25L, 54L, 64L, 61L, 22L, NA, 11L, 
18L, 25L, 13L, 20L, 14L, NA, 31L, 34L, 28L, 47L, 32L, 14L, NA, 
16L, 26L, 49L, 46L, 54L, 22L, NA, 26L, 32L, 44L, 64L, 55L, 34L, 
NA, 18L, 60L, 52L, 55L, 50L, 20L, NA, 7L, 11L, 23L, 13L, 7L, 
NA, NA, 1L, 5L, 16L, 36L, 55L, 19L, NA, 17L, 32L, 52L, 50L, 69L, 
21L, NA, 28L, 37L, 57L, 73L, 65L, 36L, NA, 26L, 16L, 41L, 60L, 
58L, 63L, NA, 7L, NA, 17L, 36L, 67L, 31L, NA, 20L, 32L, 54L, 
60L, 8L, NA, NA, 26L, 31L, 70L, 34L, 2L, 4L, NA, NA, 18L, 17L, 
41L, 73L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 31L, 11L, 17L, 26L, 14L, 
2L, 14L, 16L, 10L, 15L, 17L, 6L, 7L, 17L, 5L, 5L, 14L, 46L, 11L, 
8L, 11L, 12L, 3L, 12L, 19L, 8L, 3L, 10L, 19L, 6L, 9L, 35L, 17L, 
9L, 27L, 36L, 11L, 14L, 18L, 10L, 12L, 11L, 18L, 22L, 26L, 14L, 
NA, 12L, 20L, 38L, 39L, 39L, 19L, NA, 29L, 25L, 36L, 46L, 55L, 
27L, NA, 15L, 20L, 39L, 47L, 58L, 35L, NA, 23L, 26L, 30L, 53L, 
78L, 29L, NA, 37L, 28L, 38L, 59L, 73L, 21L, NA, 28L, 23L, 35L, 
66L, 54L, 53L, NA, 40L, 15L, 26L, 28L, 29L, 13L, NA, 12L, 30L, 
27L, 30L, 31L, 23L, NA, 43L, 27L, 29L, 79L, 62L, 30L, NA, 36L, 
25L, 51L, 55L, 55L, 32L, NA, 21L, 20L, 56L, 50L, 60L, 43L, 27L, 
NA, 27L, 22L, 39L, 48L, 67L, 25L, NA, 31L, 23L, 56L, 58L, 56L, 
22L, NA, 22L, 33L, 51L, 30L, 53L, 15L, NA, 9L, 15L, 41L, 36L, 
47L, 14L, NA, 10L, 11L, 38L, 40L, 53L, 12L, NA, 11L, 23L, 26L, 
52L, 39L, 18L, NA, 5L, 19L, 24L, 27L, 13L, 10L, NA, NA, NA, 7L, 
7L, NA, 3L, NA, NA)), row.names = c(NA, -365L), class = c("tbl_df", 
"tbl", "data.frame"))
> 

不知道为什么预订值显示为双数字。

但是它们的数据类型是整数。

> typeof(df_final$bookings)
[1] "integer"

0 个答案:

没有答案