Question

我的目标是从人口普查局下载整个美国的shapefile。我编写了一些函数，将人口普查网站上的zip文件下载到一个临时目录（通过一些循环来到达每个州），处理文件以便R可以使用它们，然后转储其余的。从理论上讲，我最终会得到一个拥有我需要的数据的庞大数据框。我将此写入.csv，因此我不必在以后重复此过程。

但是！在几个州之后，R继续崩溃。我认为问题在于我是如何临时存储数据的，但很容易出现更多问题，使代码运行速度超慢，然后冻结。如果我只做一个州，那就完美了。

这是我获取每个zip文件的所有州号和下载页面的URL的地方。

mystates.us.unique = c("1", "2", "4", "5", "6", "8", "9", "10", "11", "12", "13", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25","26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "44", "45", "46", "47", "48", "49", "50", "51", "53", "54", "55", "56")

mystates.us.url = ifelse(nchar(mystates.us.unique) == 1, paste("0", mystates.us.unique, sep = ""),
                         ifelse(nchar(mystates.us.unique == 2), mystates.us.unique, "HELPPPP"))
url = paste("http://www2.census.gov/geo/tiger/TIGER2010/BG/2000/tl_2010_",
           mystates.us.url,"_bg00.zip", sep = "")

这是使shapefile可用的功能。

process.shapefiles = function(folder, shapefile){
  block = readOGR(dsn = folder, layer = shapefile)
  block@data$id = rownames(block@data)
  block.points = fortify(block, region = "id")
  block.df = join(block.points, block@data, by = "id")
  block.df = subset(block.df, select = c(long, lat, group, BKGPIDFP00))
  names(block.df) = c("long", "lat", "group", "GEOID")
  block.df
}

这里是我实际访问每个网站，下载到临时文件夹，使用上述功能进行处理，然后将每个状态放入列表中的相应条目state.shapes

state.shapes = list(rep(data.frame(NULL), length(url)))
for(i in 1:length(url)){
    temp.dir = tempdir()
    temp.file = tempfile()
    download.file(url = url[i], destfile = temp.file)
    data = unzip(zipfile = temp.file, exdir = temp.dir)
    state.shapes[[i]] = process.shapefiles(temp.dir, strsplit(basename(url[i]), split = "\\.")[[1]][1])
    unlink(temp.dir)
    unlink(temp.file)
}

有什么想法？注意：您需要包rgdal，plyr和utils

tempdir（）在哪里存储R中的数据;有可能重新路线吗？

0 个答案: