我有超过7,000个文件* .bil文件我正在尝试合并到一个* .csv文件中并将其导出。我能够使用raster和as.data.frame读取* .bil文件:
setwd("/.../Prism Weather Data All/")
filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")
r = raster("PRISM_ppt_stable_4kmM2_189501_bil.bil")
test <- as.data.frame(r, na.rm=TRUE)
设置工作目录并使用* .bil抓取所有文件。但我只栅格一个文件并设置as.data.frame来验证它是否正确,这是完美的。但我想弄清楚如何将所有7000个文件(文件名)合并为一个。
任何有关此的帮助将不胜感激。提前谢谢。
答案 0 :(得分:1)
假设7000是实数而不是近似值,并且每个文件中的所有数据都具有相同的结构(相同的列数和行数):
setwd("/.../Prism Weather Data All/")
nc<- ## put the number of columns of each file (assuming they're all the same)
nr<- ## put the number of rows of each file (assuming they're all the same)
filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")
# initialize what is likely to be a large object
final.df<-as.data.frame(matrix(NA,ncol=7000*nc,nrow=nr))
counter=1
# loop through the files
for (i in filenames){
r = raster(i)
test <- as.data.frame(r, na.rm=TRUE)
final.df[,counter:counter+nc]<-test
counter<-counter+nc+1
}
# write the csv
write.csv(final.df,"final-filename.csv")
请记住,您的计算机必须有足够的内存来保存所有数据,因为R需要在内存中包含对象。
如果列数因文件而异,您可以通过调整循环内final.df
分配中的索引并相应增加counter
来调整它。
编辑:产生预期结果
我认为for循环是关于完成这类工作的唯一方法。事实上,7000个文件是一个非常大的集合,所以期望花一些时间看它迭代。
setwd("/.../Prism Weather Data All/")
nc<- ## put the number of columns you expect the data in the files to have
nr<- ## put roughly the number of rows times 12 (if you plan to read a year worth of data)
## PLUS some tolerance, so you'll end up with an object actually larger than needed
filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")
# initialize what is likely to be a large object
final.df<-as.data.frame(matrix(NA,ncol=c,nrow=nr))
counter=1
# loop through the files
for (i in filenames){
r = raster(i)
test <- as.data.frame(r, na.rm=TRUE)
numrow2<-nrow(test)
final.df[counter:counter+numrow2,]<-test
counter<-counter+numrow2+1
}
final.df[counter-1:nrow(final.df),]<-NULL ## remove empty rows
# write the csv
write.csv(final.df,"final-filename.csv")
希望它有所帮助。
答案 1 :(得分:1)
我一直在使用Prism数据,下面是另一种方法。如果您可以合并7,000个.bil文件中的每个“站点”或行名称。在这种情况下,每个月将是一个单独的列,对应于相同的站ID /行。
setwd("/.../Prism Weather Data All/")
require(dplyr)
require(raster)
#This makes sure only .bil is read (not asc.bil, etc)
filenames <- dir("/.../Prism Weather Data All/", pattern = "\\.bil$")
z <- as.data.frame(matrix(NA))
#loop through the data, and name each column the name of the date in
#the spreadsheet (according to Prism's naming convention, the date
#starts at character 24 and ends at character 29)
for (file in filenames){
r <- raster(filenames)
test <- as.data.frame(r, na.rm=TRUE, row.names=TRUE, col.names=FALSE)
names(test)<- c(substring(file, 24, 29))
z <- cbind(z, test)
}
#then export the data.frame to CSV!