R转换zipcode或lat / long到县

时间:2012-11-09 21:19:41

标签: r geolocation geocoding

我有一个位置列表,其中包含每个位置的城市,州,邮政编码,纬度和经度。

我单独列出了县级经济指标。我使用了zipcode软件包,ggmap软件包以及其他几个免费的地理编码网站,包括美国Gazeteer文件,但似乎无法找到匹配这两个文件的方法。

目前是否有任何软件包或其他来源可以执行此操作?

4 个答案:

答案 0 :(得分:19)

我最终使用了上述JoshO'Brien中的建议并找到here

我拿了他的代码并将state更改为county,如下所示:

library(sp)
library(maps)
library(maptools)

# The single argument to this function, pointsDF, is a data.frame in which:
#   - column 1 contains the longitude in degrees (negative in the US)
#   - column 2 contains the latitude in degrees

latlong2county <- function(pointsDF) {
    # Prepare SpatialPolygons object with one SpatialPolygon
    # per county
    counties <- map('county', fill=TRUE, col="transparent", plot=FALSE)
    IDs <- sapply(strsplit(counties$names, ":"), function(x) x[1])
    counties_sp <- map2SpatialPolygons(counties, IDs=IDs,
                     proj4string=CRS("+proj=longlat +datum=WGS84"))

    # Convert pointsDF to a SpatialPoints object 
    pointsSP <- SpatialPoints(pointsDF, 
                    proj4string=CRS("+proj=longlat +datum=WGS84"))

    # Use 'over' to get _indices_ of the Polygons object containing each point 
    indices <- over(pointsSP, counties_sp)

    # Return the county names of the Polygons object containing each point
    countyNames <- sapply(counties_sp@polygons, function(x) x@ID)
    countyNames[indices]
}

# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))

latlong2county(testPoints)
[1] "wisconsin,juneau" "oregon,crook" # IT WORKS

答案 1 :(得分:8)

将Zipcodes与县匹配很困难。 (某些邮政编码跨越多个县,有时甚至超过一个州。例如30165)

我不知道任何特定的R包可以匹配这些。

但是,您可以从密苏里州人口普查数据中心获得一张漂亮的桌子 您可以使用以下内容进行数据提取:http://bit.ly/S63LNU

示例输出可能如下所示:

    state,zcta5,ZIPName,County,County2
    01,30165,"Rome, GA",Cherokee AL,
    01,31905,"Fort Benning, GA",Russell AL,
    01,35004,"Moody, AL",St. Clair AL,
    01,35005,"Adamsville, AL",Jefferson AL,
    01,35006,"Adger, AL",Jefferson AL,Walker AL
    ...

注意County2。 可以找到元数据说明here

    county 
    The county in which the ZCTA is all or mostly contained. Over 90% of ZCTAs fall entirely within a single county.

    county2 
    The "secondary" county for the ZCTA, i.e. the county which has the 2nd largest intersection with it. Over 90% of the time this value will be blank.

另请参阅ANSI县代码 http://www.census.gov/geo/www/ansi/ansi.html

答案 2 :(得分:5)

我认为包装&#34;非共识&#34;很有帮助。

对应我用来匹配zipcode与县

### code for get county based on zipcode

library(noncensus)
data(zip_codes)
data(counties)

state_fips  = as.numeric(as.character(counties$state_fips))
county_fips = as.numeric(as.character(counties$county_fips))    
counties$fips = state_fips*1000+county_fips    
zip_codes$fips =  as.numeric(as.character(zip_codes$fips))

# test
temp = subset(zip_codes, zip == "30329")    
subset(counties, fips == temp$fips)

答案 3 :(得分:3)

一个简单的选择是使用geocode()中的ggmap功能,使用选项output="more"output="all

这可以采用灵活的输入,例如地址或纬度/经度,并将地址,城市,县,州,国家,邮政编码等作为列表返回。

require("ggmap")
address <- geocode("Yankee Stadium", output="more")

str(address)
$ lon                        : num -73.9
$ lat                        : num 40.8
$ type                       : Factor w/ 1 level "stadium": 1
$ loctype                    : Factor w/ 1 level "approximate": 1
$ address                    : Factor w/ 1 level "yankee stadium, 1 east 161st street, bronx, ny 10451, usa": 1
$ north                      : num 40.8
$ south                      : num 40.8
$ east                       : num -73.9
$ west                       : num -73.9
$ postal_code                : chr "10451"
$ country                    : chr "united states"
$ administrative_area_level_2: chr "bronx"
$ administrative_area_level_1: chr "ny"
$ locality                   : chr "new york"
$ street                     : chr "east 161st street"
$ streetNo                   : num 1
$ point_of_interest          : chr "yankee stadium"
$ query                      : chr "Yankee Stadium"

另一种解决方案是使用人口普查shapefile和问题中的相同over()命令。我使用maptools基本地图遇到了问题:因为它使用WGS84基准,在北美,海岸几英里范围内的点被错误地映射,大约5%的数据集不匹配。

尝试使用sp包和Census TIGERLine形状文件

counties <- readShapeSpatial("maps/tl_2013_us_county.shp", proj4string=CRS("+proj=longlat +datum=NAD83"))

# Convert pointsDF to a SpatialPoints object 
pointsSP <- SpatialPoints(pointsDF, proj4string=CRS("+proj=longlat +datum=NAD83"))

countynames <- over(pointsSP, counties)
countynames <- countynames$NAMELSAD