Question

道歉，如果在其他地方已经回答了这个问题。我是R的新手，并且花了我2天的时间来试图超越这个最初的障碍。

我获得了一个包含大约2000个独立数据文件的数据集。我想将它们合并到一个非常大的数据集中。我找到了一些人们建议工作的方法，但没有一种方法适合我。例如，一个博客（http://psychwire.wordpress.com/2011/06/03/merge-all-files-in-a-directory-using-r-into-a-single-dataframe/）建议使用以下代码：

setwd("target_dir/")

file_list <- list.files()

for (file in file_list){

  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=TRUE, sep="\t")
  }

  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=TRUE, sep="\t")
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}

当我使用此代码（将'target_dir'更改为正确的目录）时，R向我展示了以下内容：

Error in match.names(clabs, names(xi)) : 
  names do not match previous names

我的预感是我要么没有更改代码中我需要的变量之一，以便它与我的特定数据相关（我将'target_dir'更改为正确的目录，但没有改变任何东西否则），或者是因为.dat文件没有任何列标题。如果是这种情况，我的第二个问题是是否有办法使用R为多个.dat文件创建相同的列标题。

非常感谢您花时间阅读本文。

Answer 1

试试这个：

setwd("target_dir/")

file_list <- list.files()

for (file in file_list){

  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=FALSE, sep="\t", 
               col.names = c("a", "b", "c"))
  }

  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=FALSE, sep="\t",
    col.names = c("a", "b", "c"))
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }
}

您要将c("a", "b", "c")替换为要用于列的名称。或者省略col.names参数，R将使用V1，V2等

然而，最好不要使用for循环，如评论中所指出的那样。使用lapply读取所有数据框，并使用do.call(rbind, ...)或plyr::rbind.all来叠加您已阅读的数据框。

R：如何合并2000 .dat文件，然后添加列标题

1 个答案: