如何在R中处理多个csv文件

时间:2014-12-23 20:13:26

标签: r csv for-loop

我在三个单独的文件夹中有许多csv文件,如下所示:

folder1
a1_0023.csv
a2_0034.csv
a3_6163.csv
...
(100 files)

folder2
b1_0023.csv
b2_0034.csv
b3_6163.csv
...
(100 files)

folder3
c1_0023.csv
c2_0034.csv
c3_6163.csv
...
(100 files)

我有一个文本文件列出了最后四位数字:

theLastFourDigits.txt
0023
0034
6163
...
(100 lines)

对于0023文件,我在R中做了一个简单的工作:

a <- read.table("D:/folder1/a1_0023.csv", header=FALSE, sep=",")
a <- as.matrix(a)
b <- read.table("D:/folder2/b1_0023.csv", header=FALSE, sep=",")
b <- as.matrix(b)
c <- read.table("D:/folder3/c1_0023.csv", header=FALSE, sep=",")
c <- as.matrix(c)

# Initiate the column vector that contains the results
myanswer <- matrix(0, nrow=100, ncol=1)

# Do a simple job, and store the result in myanswer column
myanswer[1] = nrow(a)*nrow(b)/nrow(c)

我在这里有两个问题:(1)我们如何为整个100位数字迭代这个过程? (2)如果我没有theLastFourDigits.txt列表文件,我们如何才能完成多个工作?

修改

我尝试过以下内容:

setwd("D:/folder1/")
filelist1 <- Sys.glob("*.csv")
setwd("D:/folder2/")
filelist2 <- Sys.glob("*.csv")
setwd("D:/folder3/")
filelist3 <- Sys.glob("*.csv")

for (i in 1:100) {

 setwd("D:/folder1/")
 a <- read.csv(filelist1[i], header=FALSE, sep=",")
 a <- as.matrix(a)
 setwd("D:/folder2/")
 b <- read.csv(filelist2[i], header=FALSE, sep=",")
 b <- as.matrix(b)
 setwd("D:/folder3/")
 c <- read.csv(filelist3[i], header=FALSE, sep=",")
 c <- as.matrix(c)

 nrow(a)*nrow(b)/nrow(c)

}

错误信息如下:

 Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  no lines available in input 
3 stop("no lines available in input") 
2 read.table(file = file, header = header, sep = sep, quote = quote, 
    dec = dec, fill = fill, comment.char = comment.char, ...) 
1 read.csv(filelist1[i], header = FALSE, sep = ",") 

我在这里缺少什么?

1 个答案:

答案 0 :(得分:2)

对于问题(2),您可能会发现此功能很有用。我过去曾用它来读取给定文件夹中的所有csv文件(Windows 7)。您需要根据应用程序的需要修改read.csv()参数。读入文件夹中的所有数据后,您可以使用lapply()将所有数据帧转换为矩阵。

list.csv <- function(mydir, add.source=TRUE) {
    # combine all csv files in a given directory into a single list
    filenames <- list.files(mydir)[grep(".csv$", list.files(mydir))]
    nfiles <- length(filenames)
    # create an empty list where all the files will be stored
    files.list <- vector(mode="list", length=nfiles)
    for(i in 1:nfiles) {
        # read the data into a temporary file
        temp <- read.csv(paste(mydir, filenames[i], sep=""), as.is=TRUE)
        # add a new column identifying the source file
        if(add.source) temp$source <- filenames[i]
        # put the data into the list
        files.list[[i]] <- temp
        }
    files.list
    }

mylist <- list.csv("C:/temp/")

# look at headers from all the data frames
lapply(mylist, head)

# convert all the data frames to matrices
mylistm <- lapply(mylist, as.matrix)