我正在尝试使用.csv
和lapply
导入多个至少1GB的read_csv_chunked
文件。要创建示例数据,请查看:
A <- data.frame(id = sample(c("A","B","C"), 100, replace = T), k = rnorm(100), j = rnorm(100), l = rnorm(100)); write_csv(A, "A.csv")
B <- data.frame(id_limp = sample(c("A","B","C"), 100, replace = T), k = rnorm(100), j = rnorm(100), l = rnorm(100)); write_csv(B, "B.csv")
C <- data.frame(id = sample(c("A","B","C"), 100, replace = T), k = rnorm(100), j = rnorm(100), l = rnorm(100)); write_csv(C, "C.csv")
D <- data.frame(id_samp = sample(c("A","B","C"), 100, replace = T), k = rnorm(100), j = rnorm(100), l = rnorm(100)); write_csv(D, "D.csv")
E <- data.frame(id = sample(c("A","B","C"), 100, replace = T), k = rnorm(100), j = rnorm(100), l = rnorm(100)); write_csv(E, "E.csv")
read_csv_chunked
使用的功能如下:
f <- function(x, pos) if("id" %in% names(x)) { subset(x, id == "A")} else if ("id_limp" %in% names(x)) {subset(x, id_limp == "A")} else {subset(id_samp == "A")}
file_list <- list.files(pattern = ".csv")
data <- lapply(file_list, read_csv_chunked, callback = DateFrameCallback$new(f),chunk_size = 10)
names(data)<- tolower(word(gsub("[[:punct:]]+"," ",file_list), 1))
list2env(data, envir = .GlobalEnv)
rm(data)
其中f
是callback
所需的read_csv_chunked
函数(或至少应该如此),file_list
是该文件夹中文件的列表.csv
,data
传递了lapply
函数,names(data)
为data
中的小标题命名,list2env
写出{{ 1}}。
我得到的错误是:
data
有人可以解释为什么我得到这个错误吗?如果我一次没有读Error in as_chunk_callback(callback) :
object 'DateFrameCallback' not found
,并且没有csv
,就不会收到错误消息。有什么解决办法?还是有更好的解决方案?