从R中的XML Geoplot多个地址

时间:2013-01-30 13:28:47

标签: xml r geocoding

我正在尝试累积地址,以便将它们绘制在R中的地图上。我手动获取地址并将它们输入到.csv中以导入到R.中.csv的格式如下:< / p>

  

streetnumber |街道|城市|州

     

1150 | FM 1960 West Road |休斯顿| TX

     

701 |凯勒百汇|凯勒| TX

每个标题(街道号,街道,城市和州)都是一个唯一的列,下面的数据分为各自的列。

我让R读取.csv中的信息并将其转换为适合Google Maps API使用的格式。我有API生成一个.xml文件,其中包含与输入的地址相对应的信息。最小的工作示例如下:

streetnumber1<-paste(data$streetnumber,sep="")
street1<-gsub(" ","+",data$street)
street2<-paste(street1,sep="")
city1<-paste(data$city,sep="")
state1<-paste(data$state,sep="")

url<-paste("http://maps.googleapis.com/maps/api/geocode/xml?address="
,streetnumber1,"+",street2,",+",city1,",+",state1,"&sensor=false",sep="")

调用url会生成两个可以输入网络浏览器的网址,以导航到Google Maps API提供的.xml数据。

我希望在.csv文件中的所有地址都能发生这种情况,而不会声明应该生成url的次数。我觉得这是apply函数的工作,但我不确定如何去做。一旦我自动化R和API之间的交互,我想解析获得的.xml,以便我可以提取我正在寻找的信息。

3 个答案:

答案 0 :(得分:6)

ggmap包有一个geocode函数,我强烈推荐使用它,而不是在这里重新发明轮子。

修改:由于您说“多个地址”,您可能更喜欢使用data.frame方法的my version和内置批量地理编码的一些健壮性检查,并允许使用Bing Maps API(带有每天25K,而不是像谷歌地图那样每天2.5K。

答案 1 :(得分:4)

从这个问题我不清楚你究竟想从谷歌那里获得什么。我假设这是纬度和经度。如果是,请尝试类似屏幕截图后面的代码。编辑:修改为根据Ari B. Friedman的评论使用来自geocode包的ggmap函数包含替代(和更简单)方法。

screenshot

# Read in the text from your example
mydf <- read.csv(con <- textConnection(
    "streetnumber|street|city|state
    1150|FM 1960 West Road|Houston|TX
    701|Keller Parkway|Keller|TX"), header = TRUE, sep = "|", check.names = FALSE)

# APPROACH 1 - works but Approach 2 probably better (see below)
# Create a new column for the URL to pass to Google API
mydf$url <- with(mydf, paste("http://maps.googleapis.com/maps/api/geocode/xml?address=",
                             streetnumber,
                             gsub(" ", "+", street),
                             city, "+",
                             state, "+",
                             "&sensor=false",
                             sep = ""))

# Check to see what we have in the data frame
str(mydf)

library(XML)
latlon <- lapply(mydf$url, function(x) { # process each element in the column 'url'
       myxml <- xmlTreeParse(x, useInternal = TRUE) # pass the element (an URL) to the XML function
       # parse the result
       lat = xpathApply(myxml, '/GeocodeResponse/result/geometry/location/lat', xmlValue)[[1]]
       lon = xpathApply(myxml, '/GeocodeResponse/result/geometry/location/lng', xmlValue)[[1]]
       data.frame(lat = lat, lon = lon) # return the latitude and longitude as a data frame
   })

# We end up with a list of data frames, so merge the data frames into one:
library(reshape)
latlon <- merge_all(latlon)

# Then bolt the columns on to your existing data frame
mydf <- cbind(mydf, latlon, stringsAsFactors = FALSE)

# We want the latitude and longitude to numbers, not characters
mydf$lat <- as.numeric(mydf$lat)
mydf$lon <- as.numeric(mydf$lon)

require(ggmap)

# APPROACH 2 - let ggmap do the heavy lifting (and 
# comment out Approach 1 if you use this)

mydf$location <- with(mydf, paste(streetnumber,street, city, state,sep = ", "))

latlon <- geocode(mydf$location)
mydf <- cbind(mydf, latlon, stringsAsFactors = FALSE)

# Now plot.
# Be careful when specifying the zoom argument, because larger values can cause
# points to be dropped by geom_point()
ggmap(get_googlemap(maptype = 'roadmap', zoom = 6, scale = 2), extent = 'panel') +
       geom_point(data = mydf, aes(x = lon, y = lat), fill = "red", colour = "black",
                  size = 3, shape = 21)

答案 2 :(得分:1)

使用Google Mpas API时,最好使用他的JSON API。它不像JSON那样轻量级。

为了保持连续性,我稍微修改了您的原始代码,并使用RJSONIO包。

## I read your data
dat <- read.table(text = '
streetnumber | street | city | state
1150 | FM 1960 West Road | Houston | TX
701 | Keller Parkway | Keller | TX',header= T, sep = '|')

library(RJSONIO)
## here the use of json in placee of xml
## the static part of the url request
url.base <- "http://maps.googleapis.com/maps/api/geocode/json?address="

## I create a data.frame with your formatted data
dat2 <- data.frame(
  streetnumber1 = paste(dat$streetnumber,sep=""),
  street2 = paste(gsub(" ","+",dat$street),sep=""),
  city1 = paste(dat$city,sep=""),
  state1 = paste(dat$state,sep=""))

## I use apply here to call it for each row
apply(dat2,1, function(x){
  url<-paste(url.base,x[1],"+",x[2],
             ",+",x[3],",+",x[4],"&sensor=false",sep="")
  res <- fromJSON(url)    ## single statement 
  ## e. to get lat/long
  lat.long <- res$results[[1]]$geometry$bounds$northeast
})

res这里只是一个列表。您可以轻松地对其进行加法和解析。