' J'不会为data.table中的每个组问题评估相同数量的列

时间:2014-03-24 22:10:27

标签: r data.table

使用data.table时遇到错误。错误消息是:

  

[.data.table(x.out, , if (all(V3 > 25)) c(as.character(V1[1])中的错误,:     j并不评估每个组的相同列数

我想知道如何解决这个错误。

我的数据看起来像这样(请参阅本文底部,了解使用dput可重现的数据形式):

c007d.1    1       2
c007d.1    2       2
c007d.1    3       2
c007d.1    4       31
c007d.1    5       55
c007d.1    6       60
c007d.1    7       13 

当我运行此处显示的代码时:

library(data.table)
x.out$grp <- rep(1:ceiling(nrow(x.out)/3),each=3)
output <- x.out[, if(all(V3 > 25)) c(as.character(V1[1]), 
                  V2[1], V2[3], as.list(V3)), by = grp]

输出如下:

      grp   V1  V2 V3 V4 V5 V6
1:    2    d3.1  4  6 31 55 60 

代码检查第3列是否有3条背靠背值超过25的行,如果是,则打印第2列的第一行和最后一行。

这适用于此处显示的小数据。但是当我在一个有16,000行的文件上运行它时,我遇到了上面提到的错误。有没有解决的办法?我并不特别强调使用data.table,但我理解它比其他选项更快。


这是dput(x.out)的输出:

> head(dput(x.out))

structure(list(V1 = c("c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1"), V2 = 1:287, 
V3 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 3, 24, 33, 43, 47, 48, 48, 48, 50, 53, 63, 70, 78, 82, 
82, 82, 82, 82, 82, 84, 84, 84, 87, 88, 88, 93, 103, 138, 
158, 175, 186, 222, 319, 398, 487, 540, 554, 574, 581, 584, 
587, 588, 587, 559, 557, 557, 557, 556, 556, 556, 556, 556, 
556, 554, 554, 546, 542, 530, 478, 462, 454, 437, 412, 374, 
246, 244, 211, 54, 49, 1, 1, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 
4, 5, 6, 6, 6, 7, 7, 8, 10, 12, 21, 68, 147, 533, 588, 600, 
601, 620, 646, 666, 694, 709, 725, 729, 737, 743, 750, 784, 
805, 829, 849, 907, 929, 957, 982, 984), grp = c(1L, 1L, 
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 
6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 
11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L, 
15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 
19L, 19L, 20L, 20L, 20L, 21L, 21L, 21L, 22L, 22L, 22L, 23L, 
23L, 23L, 24L, 24L, 24L, 25L, 25L, 25L, 26L, 26L, 26L, 27L, 
27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L, 31L, 
31L, 31L, 32L, 32L, 32L, 33L, 33L, 33L, 34L, 34L, 34L, 35L, 
35L, 35L, 36L, 36L, 36L, 37L, 37L, 37L, 38L, 38L, 38L, 39L, 
39L, 39L, 40L, 40L, 40L, 41L, 41L, 41L, 42L, 42L, 42L, 43L, 
43L, 43L, 44L, 44L, 44L, 45L, 45L, 45L, 46L, 46L, 46L, 47L, 
47L, 47L, 48L, 48L, 48L, 49L, 49L, 49L, 50L, 50L, 50L, 51L, 
51L, 51L, 52L, 52L, 52L, 53L, 53L, 53L, 54L, 54L, 54L, 55L, 
55L, 55L, 56L, 56L, 56L, 57L, 57L, 57L, 58L, 58L, 58L, 59L, 
59L, 59L, 60L, 60L, 60L, 61L, 61L, 61L, 62L, 62L, 62L, 63L, 
63L, 63L, 64L, 64L, 64L, 65L, 65L, 65L, 66L, 66L, 66L, 67L, 
67L, 67L, 68L, 68L, 68L, 69L, 69L, 69L, 70L, 70L, 70L, 71L, 
71L, 71L, 72L, 72L, 72L, 73L, 73L, 73L, 74L, 74L, 74L, 75L, 
75L, 75L, 76L, 76L, 76L, 77L, 77L, 77L, 78L, 78L, 78L, 79L, 
79L, 79L, 80L, 80L, 80L, 81L, 81L, 81L, 82L, 82L, 82L, 83L, 
83L, 83L, 84L, 84L, 84L, 85L, 85L, 85L, 86L, 86L, 86L, 87L, 
87L, 87L, 88L, 88L, 88L, 89L, 89L, 89L, 90L, 90L, 90L, 91L, 
91L, 91L, 92L, 92L, 92L, 93L, 93L, 93L, 94L, 94L, 94L, 95L, 
95L, 95L, 96L, 96L)), .Names = c("V1", "V2", "V3", "grp"), row.names = c(NA, 
-287L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x7fdb7b006d78>)

1 个答案:

答案 0 :(得分:2)

错误信息实际上是不言自明的。

您应该提供可重复的示例。这是一个(替换你的例子的第7行中的V3)

x.out <- data.table(structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), 
 .Label = "c007d.1", class = "factor"), 
  V2 = 1:7, V3 = c(2L, 2L, 2L, 31L, 55L, 60L, 26L)), 
.Names = c("V1",  "V2", "V3"),
 class = c("data.frame"), row.names = c(NA, -7L)))


# add your grouping column (data.table style)
x.out[, grp := rep(seq_len(ceiling(.N/3)), each = 3,length.out=.N)]

您的问题是,对于grp = 2,您在V3中有3行,在grp = 3中,您有1,因此当您使用as.list(与c一起使用)时,您创建列表不同长度。

您想如何填写grp = 3的缺失列?

编辑:

对于可重现的示例,除了grp 96 ..

之外,所有grp都有行