在数据帧中读取相同格式的多个csv

时间:2014-02-20 13:14:05

标签: r csv macros dataframe

我需要为多个CSV文件运行相同的代码集。我想用宏做同样的事情。下面是我正在执行的代码,但结果不正确。它是以二维格式读取数据,而我需要以三维格式运行。

lf = list.files(path = "D:/THD/data", pattern = ".csv",
                full.names = TRUE, recursive = TRUE, include.dirs = TRUE)
ds<-lapply(lf,read.table)

1 个答案:

答案 0 :(得分:0)

我不知道这是否有用,但我的方法之一是:

 ##Step 1 read files 

       mycsv = dir(pattern=".csv")

        n <- length(mycsv) 

    mylist <- vector("list", n) 

    for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i],header = T) 

然后我实际上只使用apply函数来改变事物,例如,

 ##  Change coloumn name
    mylist <- lapply(mylist, function(x) {names(x) <- c("type","date","v1","v2","v3","v4","v5","v6","v7","v8","v9","v10","v11","v12","v13","v14","v15","v16","v17","v18","v19","v20","v21","v22","v23","v24","total") ; return(x)}) 

    ##  changing type coloumn for weekday/weekend

    mylist <- lapply(mylist, function(x) {
        f = c("we", "we", "wd", "wd", "wd", "wd", "wd")
        x$type = rep(f,52, length.out = 365)
        return(x)
    }) 


and so on. 

然后我在完成所有更改之后再次使用以下代码保存(对于拆分原始文件名并重命名每个文件以保存文件名的一部分,以便稍后我可以跟踪每个单独的文件)

##  for example some of my file had a pattern in file name such as "201_E424220_N563500.csv",so I split this to save with a new name like this:    

mylist <-lapply(1:length(mylist), function(i) {
    mylist.i <- mylist[[i]]     
    s = strsplit(mycsv[i], "_" , fixed = TRUE)[[1]]     
    d = cbind(mylist.i[, c("type", "date")], ID = s[1], Easting = s[2], Northing = s[3], mylist.i[, 3:ncol(mylist.i)])     
    return(d) 
})

for(i in 1:n)    
    write.csv(file = paste("file", i, ".csv", sep = ""),    mylist[i], row.names = F)

我希望这会有所帮助。当你得到一些时间阅读有关PLYR包的请求,因为我相信这对你非常有用,它是一个非常有用的包,有很多数据分析选项。 PLYR已应用以下功能:

## l_ply   split list, apply function and discard result 
## ldply   split list, apply function and return result in data frame 
## laply   split list, apply function and return result in an array

例如,您可以使用ldply读取所有csv并返回数据框simething:

data = ldply(list.files(pattern = ".csv"), function(fname) {

  j = read.csv(fname, header = T)

  return(j)

})

所以这里J将是你的所有csv文件数据的数据框。

谢谢,阿燕