Question

我想将几个csv文件合并为一个列表然后输出为一个合并的csv。假设这些文件名为file1.csv，file2.csv，file3.csv等...

file1.csv     # example of what each might look like
V1 V2 V3 V4
12 12 13 15
14 12 56 23

如何创建这些csv的列表，以便我可以输出一个合并的csv，它将标题作为文件名，顶部的列名作为注释？所以在Excel中看起来像这样的csv：

# 1: V1
# 2: V2
# 3: V3
# 4: V4

file1.csv
12 12 13 15
14 12 56 23

file2.csv
12 12 13 15
14 12 56 23

file3.csv
12 12 13 15
14 12 56 23

我试图在double for循环中使用list函数将这些csv合并在一起，将每个列表写入变量，并将每个变量写入表输出。但是这并没有按预期工作。

# finding the correct files in the directory
files <- dir("test files/shortened")
files_filter <- files[grepl("*\\.csv", files)]
levels <- unique(gsub( "-.*$", "", files_filter))

# merging
for(i in 1:length(levels)){
  level_specific <- files_filter[grepl(levels[i], files_filter)]
  bindme
  for(j in 1:length(level_specific)){
    bindme2 <- read.csv(paste("test files/shortened/",level_specific[j],sep=""))
    bindme <- list(bindme,bindme2)
    assign(levels[i],bindme)
  }
  write.table(levels[i],file = paste(levels[i],"-output.csv",sep=""),sep=",")
}

Answer 1

查看您的代码，我认为您不需要for循环。使用data.table包，您可以按以下方式执行此操作：

filenames <- list.files(pattern="*.csv")
files <- lapply(filenames, fread) # fread is the fast reading function from the data.table package
merged_data <- rbindlist(files)
write.csv(merged_data, file="merged_data_file.csv", row.names=FALSE)

如果csv中至少有一个设置了列名，则它们将用于生成的数据表中。

考虑到您的代码，可以大大改进。这样：

files <- dir("test files/shortened")
files_filter <- files[grepl("*\\.csv", files)]

可以替换为：

filenames <- list.files(pattern="*.csv")

第一次拨打bindme时，在你的for循环中，它没有做任何事情。它是什么？一个列表？数据框？你可以使用类似的东西：

bindme <- data.table() # or data.frame()

此外，部分：

write.table(levels[i],file = paste(levels[i],"-output.csv",sep=""),sep=",")

将生成多个csv - 文件，但您只需要一个合并文件。

Answer 2

这有用吗

mergeMultipleFiles <- function(dirPath, nameRegex, outputFilename){
  filenames <- list.files(path=dirPath, pattern=nameRegex, full.names=TRUE, recursive=T)
  dataList <- lapply(filenames, read.csv, header=T, check.names=F)
  combinedData <- ldply(dataList, rbind)
  write.csv(combinedData, outputFilename)
}

ps：文件名中有一个正则表达式。以防您只想合并文件的某些“模式”。

Answer 3

修改此示例。如果我理解你的问题，它会对你有帮助。

# get the names of the csv files in your current directory
    file_names = list.files(pattern = "[.]csv$")  

# for every name you found go and read the csv with that name 
# (this creates a list of files)
    import_files = lapply(file_names, read.csv)

# append those files one after the other (collapse list elements to one dataset) and save it as d
    d=do.call(rbind, import_files)

将一堆csv文件合并到一个带有标头的文件中

3 个答案: