使用data.table
时遇到错误。错误消息是:
[.data.table(x.out, , if (all(V3 > 25)) c(as.character(V1[1])
中的错误,:j
并不评估每个组的相同列数
我想知道如何解决这个错误。
我的数据看起来像这样(请参阅本文底部,了解使用dput
可重现的数据形式):
c007d.1 1 2
c007d.1 2 2
c007d.1 3 2
c007d.1 4 31
c007d.1 5 55
c007d.1 6 60
c007d.1 7 13
当我运行此处显示的代码时:
library(data.table)
x.out$grp <- rep(1:ceiling(nrow(x.out)/3),each=3)
output <- x.out[, if(all(V3 > 25)) c(as.character(V1[1]),
V2[1], V2[3], as.list(V3)), by = grp]
输出如下:
grp V1 V2 V3 V4 V5 V6
1: 2 d3.1 4 6 31 55 60
代码检查第3列是否有3条背靠背值超过25的行,如果是,则打印第2列的第一行和最后一行。
这适用于此处显示的小数据。但是当我在一个有16,000行的文件上运行它时,我遇到了上面提到的错误。有没有解决的办法?我并不特别强调使用data.table
,但我理解它比其他选项更快。
这是dput(x.out)
的输出:
> head(dput(x.out))
structure(list(V1 = c("c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1",
"c007d.1", "c007d.1", "c007d.1", "c007d.1"), V2 = 1:287,
V3 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 3, 24, 33, 43, 47, 48, 48, 48, 50, 53, 63, 70, 78, 82,
82, 82, 82, 82, 82, 84, 84, 84, 87, 88, 88, 93, 103, 138,
158, 175, 186, 222, 319, 398, 487, 540, 554, 574, 581, 584,
587, 588, 587, 559, 557, 557, 557, 556, 556, 556, 556, 556,
556, 554, 554, 546, 542, 530, 478, 462, 454, 437, 412, 374,
246, 244, 211, 54, 49, 1, 1, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4,
4, 5, 6, 6, 6, 7, 7, 8, 10, 12, 21, 68, 147, 533, 588, 600,
601, 620, 646, 666, 694, 709, 725, 729, 737, 743, 750, 784,
805, 829, 849, 907, 929, 957, 982, 984), grp = c(1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L,
11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L,
15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L,
19L, 19L, 20L, 20L, 20L, 21L, 21L, 21L, 22L, 22L, 22L, 23L,
23L, 23L, 24L, 24L, 24L, 25L, 25L, 25L, 26L, 26L, 26L, 27L,
27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L, 31L,
31L, 31L, 32L, 32L, 32L, 33L, 33L, 33L, 34L, 34L, 34L, 35L,
35L, 35L, 36L, 36L, 36L, 37L, 37L, 37L, 38L, 38L, 38L, 39L,
39L, 39L, 40L, 40L, 40L, 41L, 41L, 41L, 42L, 42L, 42L, 43L,
43L, 43L, 44L, 44L, 44L, 45L, 45L, 45L, 46L, 46L, 46L, 47L,
47L, 47L, 48L, 48L, 48L, 49L, 49L, 49L, 50L, 50L, 50L, 51L,
51L, 51L, 52L, 52L, 52L, 53L, 53L, 53L, 54L, 54L, 54L, 55L,
55L, 55L, 56L, 56L, 56L, 57L, 57L, 57L, 58L, 58L, 58L, 59L,
59L, 59L, 60L, 60L, 60L, 61L, 61L, 61L, 62L, 62L, 62L, 63L,
63L, 63L, 64L, 64L, 64L, 65L, 65L, 65L, 66L, 66L, 66L, 67L,
67L, 67L, 68L, 68L, 68L, 69L, 69L, 69L, 70L, 70L, 70L, 71L,
71L, 71L, 72L, 72L, 72L, 73L, 73L, 73L, 74L, 74L, 74L, 75L,
75L, 75L, 76L, 76L, 76L, 77L, 77L, 77L, 78L, 78L, 78L, 79L,
79L, 79L, 80L, 80L, 80L, 81L, 81L, 81L, 82L, 82L, 82L, 83L,
83L, 83L, 84L, 84L, 84L, 85L, 85L, 85L, 86L, 86L, 86L, 87L,
87L, 87L, 88L, 88L, 88L, 89L, 89L, 89L, 90L, 90L, 90L, 91L,
91L, 91L, 92L, 92L, 92L, 93L, 93L, 93L, 94L, 94L, 94L, 95L,
95L, 95L, 96L, 96L)), .Names = c("V1", "V2", "V3", "grp"), row.names = c(NA,
-287L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x7fdb7b006d78>)
答案 0 :(得分:2)
错误信息实际上是不言自明的。
您应该提供可重复的示例。这是一个(替换你的例子的第7行中的V3)
x.out <- data.table(structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = "c007d.1", class = "factor"),
V2 = 1:7, V3 = c(2L, 2L, 2L, 31L, 55L, 60L, 26L)),
.Names = c("V1", "V2", "V3"),
class = c("data.frame"), row.names = c(NA, -7L)))
# add your grouping column (data.table style)
x.out[, grp := rep(seq_len(ceiling(.N/3)), each = 3,length.out=.N)]
您的问题是,对于grp = 2,您在V3中有3行,在grp = 3中,您有1,因此当您使用as.list
(与c
一起使用)时,您创建列表不同长度。
您想如何填写grp = 3的缺失列?
编辑:
对于可重现的示例,除了grp 96 ..
之外,所有grp
都有行