R:创建一个循环,用于导入,操作(空间连接)多个文件

时间:2017-01-31 00:36:31

标签: r loops geospatial multiple-files

我是需要您帮助的基本R用户。我有多个数据文件,我想通过创建循环函数来处理;基本上,导入一个或两个文件,处理并删除它们;并重复这个过程几次。但是,对于你们许多人来说,我可能会遇到简单的代码。请帮我解决这个问题。

我只需用一个文件导入和处理数据,然后按。

test <- read.table("test.txt", header = FALSE, sep='\t', stringsAsFactors = FALSE)
test<-as.data.frame(test)

## prepared for spatial joining with polygon
coordinates(test)=~lon+lat  
proj4string(test)=CRS("+proj=longlat +datum=NAD83")   

## Import gis polygon shapefile
ZIPshp <-readShapeSpatial("D:/data/gis/Zipcode.shp",proj4string=CRS("+proj=longlat +datum=NAD83")) 

## spatial join b/w point and polygon 
test_zip <- over(test, ZIPshp[,"zipc"])
test_zip <- subset(test_zip, zipc!="") 

write.table(test_zip, "test_zip.csv", sep=",", na="NA", row.names = FALSE)

但是,我没有弄清楚如何创建循环函数以多次重复此过程,尤其是在数据处理完成后删除已处理的数据帧。这是我的试用版,但它仍然错过了一个关键部分,我真的需要你的帮助。 (我也考虑过do.call和lapply函数但未能提出)

files=list.files(pattern='*.txt')

ldf <- list()
for (i in 1:length(files)) {
ldf[[i]] <- read.table(files[[i]], header=FALSE, sep='\t',   stringsAsFactors = FALSE)  

coordinates(ldf[[i]])=~lon+lat
proj4string(ldf[[i]])=CRS("+proj=longlat +datum=NAD83")
}

(missing parts are spatial join, removal of processed data frame, and repeating this process with new data)

请帮帮我!谢谢,

1 个答案:

答案 0 :(得分:0)

您可以使用以下作为骨架来完成您的解决方案

options(stringsAsFactors=FALSE)

## Import gis polygon shapefile
ZIPshp <- readShapeSpatial("D:/data/gis/Zipcode.shp",
    proj4string=CRS("+proj=longlat +datum=NAD83")) 

##read in each file and process it
lapply(list.files(pattern='*.txt'), function(txtfile) {
    test <- read.table(txtfile, header=FALSE, sep='\t')

    ## prepared for spatial joining with polygon
    coordinates(test) <- ~lon+lat  
    proj4string(test) <- CRS("+proj=longlat +datum=NAD83")  

    ## spatial join b/w point and polygon 
    test_zip <- over(test, ZIPshp[,"zipc"])
    test_zip <- subset(test_zip, zipc!="") 

    ## output processed file as a csv
    write.csv(test_zip, 
        paste0(tools::file_path_sans_ext(txtfile), ".csv"), 
        row.names = FALSE) 
})