将基于R中文件名的文件组合到数据帧中

时间:2018-08-20 20:57:42

标签: r dataframe

我目前有一个矢量,其中包含一个指向诸如以下文件的路径的列表:

files <- c("C:/Users/Me/Desktop/cc/canada/2016/Ontario.BRU", 
           "C:/Users/Me/Desktop/cc/canada/2017/Ontario.BRU", 
           "C:/Users/Me/Desktop/cc/canada/2018/Ottawa.BRU",
           "C:/Users/Me/Desktop/cc/canada/2018/Ontario.BRU")

我想将以同一城市结尾的文件一个接一个地组合到同一数据框中。如果只有一个城市出现,我仍然会在最后将数据框另存为csv文件。这是我刚刚开始的代码:

cad<-NULL
for(b in 1:length(files)){ 
  country<-sub(".*/ *(.*?) */[[:digit:]].*", "\\1", files[b]) 

  if(country=="canada"){ 
    cad<-c(cad, files[b])
  }
    cad_cities <- unique((sub(".*/ *(.*?) *.BRU.*", "\\1", cad)))
    for(c in 1:length(cad_cities)){
      city<-sub(".*/ *(.*?) *.BRU.*", "\\1", cad)
    }
}  

我被困在这部分之后。谢谢。

编辑:数据文件示例

2018,1,0,9999,-20.70,-23.00,-22.10,81.00,0.00,000,-991,-991,-991,-2.41,-991,-991,8.90,353,97.36,-991,-991,19.00,-991
2018,1,100,9999,-21.40,-22.70,-22.00,80.00,0.00,100,-991,-991,-991,-2.42,-991,-991,7.80,264,97.36,-991,-991,18.00,-991
2018,1,200,9999,-21.40,-22.50,-21.90,79.00,0.00,200,-991,-991,-991,-2.42,-991,-991,10.30,270,97.34,-991,-991,19.00,-991
2018,1,300,9999,-20.80,-21.90,-21.40,78.00,0.00,300,-991,-991,-991,-2.43,-991,-991,10.70,263,97.32,-991,-991,18.00,-991

2 个答案:

答案 0 :(得分:0)

可能类似于以下内容。(首先,运行问题中的代码。)
未经测试,因为没有数据文件。

for(cad in cad_cities){
    tmp <- grep(cad, files, value = TRUE)
    tmp <- lapply(tmp, read.table, sep = ",")
    tmp <- do.call(rbind, tmp)
    write.csv(tmp, file = paste0(cad, ".csv"), row.names = FALSE)
}

rm(tmp)    # tidy up

答案 1 :(得分:0)

首先,从文件名中提取城市:

cities <- sub("\\.BRU", "", basename(files))

现在读取所有文件:

dataz <- lapply(files, read.csv, as.is=TRUE)
# it is usually good idea to add as.is 

然后重新整理来自相同城市的数据:

lapply(split(dataz, cities), function(x) do.call(rbind,x))

此策略应该可以工作,但是可能需要稍作修改,因为未经测试。

[编辑]

带有随机数据的测试用例:

dataz <- lapply(1:4, function(iii) as.data.frame(replicate(3, rnorm(5))))
lapply(split(dataz, cities), function(x) do.call(rbind,x))