Question

假设我有一个我想要组合成单个data.table的文件列表。我解决这个问题的基本方法是做这样的事情：

files <- dir(...) # The list of files to be combined

read.data <- function(loadfile) {
    data.dt <- data.table(read.csv(loadfile));
}

data.dt <- data.table(file = files)[, read.data(file), by = file]

这种方法的问题是当你得到空的data.tables（由只包含标题行的空文件产生）。

Error in `[.data.table`(data.table(file = files), , read.data(file),  :
columns of j don't evaluate to consistent types for each group

有没有办法让data.table无缝地正确连接空白或NULL值？这样你可以做一些像

这样的事情

if(dim(data.dt)[1] == 0) {
    data.dt <- NULL
}

这应该可以解决我遇到的大部分问题。

编辑：我应该指出我已经使用plyr例程实现了这个逻辑。 ldply（）运行得很完美，但不幸的是，一旦你尝试传递的文件数量不足，就会非常缓慢和内存密集。

Answer 1

这是data.table中的新错误。我已经提出了here所以它不会被遗忘。

一个更简单的例子是：

DT = data.table(a=1:3,b=1:9)
DT
      a b
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 1 4
[5,] 2 5
[6,] 3 6
[7,] 1 7
[8,] 2 8
[9,] 3 9
DT[,if (a==2) NULL else sum(b),by=a]
Error in `[.data.table`(DT, , if (a == 2) NULL else sum(b), by = a) : 
  columns of j don't evaluate to consistent types for each group

以下错误是正确的：

DT[,if (a==2) 42 else sum(b),by=a]
Error in `[.data.table`(DT, , if (a == 2) 42 else sum(b), by = a) : 
  columns of j don't evaluate to consistent types for each group

并使用以下方式更正：

DT[,if (a==2) 42L else sum(b),by=a]
     a V1
[1,] 1 12
[2,] 2 42
[3,] 3 18

但在修复错误之前，我无法想到NULL的解决方法。

是否有一种有效的方法来获取data.table，以便在有空组合时从plyr模仿ldply？

1 个答案: