嗨所以我有以下格式的数据
101,20130826T155649
------------------------------------------------------------------------
3,1,round-0,10552,180,yellow
12002,1,round-1,19502,150,yellow
22452,1,round-2,28957,130,yellow,30457,160,brake,31457,170,red
38657,1,round-3,46662,160,yellow,47912,185,red
我一直在阅读它们并通过此代码清理/格式化
b <- read.table("sid-101-20130826T155649.csv", sep = ',', fill=TRUE, col.names=paste("V", 1:18,sep="") )
b$id<- b[1,1]
b<-b[-1,]
b<-b[-1,]
b$yellow<-B$V6
等等 有大约300个像这样的文件,理想情况下它们都会在没有前两行的情况下编译,因为第一行只是id而我创建了一个单独的列来标识这些数据。有谁知道如何快速阅读这些表并清理和格式化我想要的方式然后将它们编译成一个大文件并导出它们?
答案 0 :(得分:2)
您可以使用lapply
读取所有文件,清理并格式化它们,并将结果数据帧存储在列表中。然后使用do.call
将所有数据帧组合成单个大数据帧。
# Get vector of files names to read
files.to.load = list.files(pattern="csv$")
# Read the files
df.list = lapply(files.to.load, function(file) {
df = read.table(file, sep = ',', fill=TRUE, col.names=paste("V", 1:18,sep=""))
... # Cleaning and formatting code goes here
df$file.name = file # In case you need to know which file each row came from
return(df)
})
# Combine into a single data frame
df.combined = do.call(rbind, df.list)