我在这里关注一些帖子 How to combine multiple .csv files in R? 和这里 Reading Many CSV Files at the Same Time in R and Combining All into one dataframe
我的目的基本相同:在R中组合成一个大矩阵倍数,非常大的csv文件 我有这个解决方案,我希望尽可能加快速度:
这是一个完全可重复的例子;我有更多更大的文件
setwd("C:/") #### set an easy directory to create acceptably large files
#### this takes about 60 seconds
for(i in 1:80){
print(80-i)
write.table(matrix(rnorm(20*3891,0,1),ncol=20),col.names=F,row.names=F,sep=",",file=paste(i,"file.csv",sep=""))
}
listfiles<-list.files(path="C:/",pattern="*.csv")
#### now the problem: this takes about 30-40 seconds; as I have bigger (and much more) files I want to speed up this step
library(plyr)
mybigmatrix<-ldply(listfiles,read.csv,header=F)
提前感谢您提供任何帮助
答案 0 :(得分:0)
可能使用特殊的包和函数,如readr和函数read_csv()
mybigmatrix<-ldply(listfiles,readr::read_csv,header=F)
答案 1 :(得分:0)
这是一个完全可重现的例子,它显示了fread()的一个问题,它不允许我强制在data.table对象的矩阵中。
setwd("C:/") #### set an easy directory to create acceptably large files
#### this takes few seconds
for(i in 1:5){
print(5-i)
write.table(matrix(rnorm(5*3891,0,1),nrow=5),col.names=F,row.names=F,sep=",",file=paste(i,"file.csv",sep=""))
}
listfiles<-list.files(path="C:/",pattern="*.csv")
myfread<-function(file){
data_frame <- fread(file,sep=",",header=FALSE,stringsAsFactors=FALSE,select=c(1:3891),colClasses=c(rep("as.numeric",3891)))
data_frame
}
###### this is a matrix 25*3891 I want an array of 1297x3x25
alld<-rbindlist(lapply(listfiles,myfread))
### why this is in characters??
as.matrix(alld)
k<-1297
m<-3
vectorr<-as.vector(t(as.matrix(alld)))
tem <- vectorr
n <- length(tem)/(k * m)
tem <- array(tem, c(m, k, n))
tem <- aperm(tem, c(2, 1, 3))
xup <- tem ####### here I have characters
答案 2 :(得分:0)
我认为这些选项中的任何一个都适合你。
setwd("C:/Users/your_path_here/test")
fnames <- list.files()
csv <- lapply(fnames, read.csv)
result <- do.call(rbind, csv)
filedir <- setwd("C:/Users/your_path_here/csv_files")
file_names <- dir(filedir)
your_data_frame <- do.call(rbind,lapply(file_names,read.csv))
filedir <- setwd("C:/Users/your_path_here/csv_files")
file_names <- dir(filedir)
your_data_frame <- do.call(rbind, lapply(file_names, read.csv, skip = 1, header = FALSE))
filedir <- setwd("C:/Users/your_path_here/csv_files")
file_names <- dir(filedir)
your_data_frame <- do.call(rbind, lapply(file_names, read.csv, header = FALSE))
temp <- setwd("C:/Users/Excel/Desktop/test")
temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)
最后,试试这个:
setwd("C:/Users/your_path_here/")
file_list <- list.files()
file_list <- list.files("C:/Users/your_path_here/")
for (file in file_list){
# if the merged dataset doesn't exist, create it
if (!exists("dataset")){
dataset <- read.table(file, header=TRUE, sep="\t")
}
# if the merged dataset does exist, append to it
if (exists("dataset")){
temp_dataset <-read.table(file, header=TRUE, sep="\t")
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
}
}