我的目标是从人口普查局下载整个美国的shapefile。我编写了一些函数,将人口普查网站上的zip文件下载到一个临时目录(通过一些循环来到达每个州),处理文件以便R可以使用它们,然后转储其余的。从理论上讲,我最终会得到一个拥有我需要的数据的庞大数据框。我将此写入.csv,因此我不必在以后重复此过程。
但是!在几个州之后,R继续崩溃。我认为问题在于我是如何临时存储数据的,但很容易出现更多问题,使代码运行速度超慢,然后冻结。如果我只做一个州,那就完美了。
这是我获取每个zip文件的所有州号和下载页面的URL的地方。
mystates.us.unique = c("1", "2", "4", "5", "6", "8", "9", "10", "11", "12", "13", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25","26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "44", "45", "46", "47", "48", "49", "50", "51", "53", "54", "55", "56")
mystates.us.url = ifelse(nchar(mystates.us.unique) == 1, paste("0", mystates.us.unique, sep = ""),
ifelse(nchar(mystates.us.unique == 2), mystates.us.unique, "HELPPPP"))
url = paste("http://www2.census.gov/geo/tiger/TIGER2010/BG/2000/tl_2010_",
mystates.us.url,"_bg00.zip", sep = "")
这是使shapefile可用的功能。
process.shapefiles = function(folder, shapefile){
block = readOGR(dsn = folder, layer = shapefile)
block@data$id = rownames(block@data)
block.points = fortify(block, region = "id")
block.df = join(block.points, block@data, by = "id")
block.df = subset(block.df, select = c(long, lat, group, BKGPIDFP00))
names(block.df) = c("long", "lat", "group", "GEOID")
block.df
}
这里是我实际访问每个网站,下载到临时文件夹,使用上述功能进行处理,然后将每个状态放入列表中的相应条目state.shapes
state.shapes = list(rep(data.frame(NULL), length(url)))
for(i in 1:length(url)){
temp.dir = tempdir()
temp.file = tempfile()
download.file(url = url[i], destfile = temp.file)
data = unzip(zipfile = temp.file, exdir = temp.dir)
state.shapes[[i]] = process.shapefiles(temp.dir, strsplit(basename(url[i]), split = "\\.")[[1]][1])
unlink(temp.dir)
unlink(temp.file)
}
有什么想法? 注意:您需要包rgdal,plyr和utils