通过网络抓取获取JavaScript Google地图坐标

时间:2011-12-28 09:16:48

标签: r web-scraping

我想获取每个房地产的坐标(webpage),我可以从这部分网页代码中获取:

<script type='text/javascript'>
    //<![CDATA[

  var GMap_1 = null;
  //  Call this function when the page has been loaded
  function GMap_initialize_1()
  {
    var mapOptions = {
      scrollwheel: 0,
      center: new google.maps.LatLng(52.3824, 16.8798),
      zoom: 17,
      mapTypeId: google.maps.MapTypeId.ROADMAP,
      mapTypeControl: true,
      mapTypeControlOptions: {style: google.maps.MapTypeControlStyle.DROPDOWN_MENU}
    };
    GMap_1 = new google.maps.Map(document.getElementById("GMap_map_1"), mapOptions);
    iconPrecision = new google.maps.MarkerImage("/images/precisionIcon2.png"); iconPrecision.iconSize = new google.maps.Size(12,20);iconPrecision.iconAnchor = new google.maps.Point(6,20);iconPrecision.infoWindowAnchor = new google.maps.Point(6,3);iconPrecision.shadow = "";iconPrecision.shadowSize = new google.maps.Size(22,20);
    marker = new google.maps.Marker({
      icon: iconPrecision,
      position: new google.maps.LatLng(52.3824, 16.8798),
      map: GMap_1
    });


  }
GMap_initialize_1();
    //]]>
  </script>

我的代码如下所示:

url<-"http://www.oferty.net/mieszkanie-na-sprzedaz-os-jana-iii-sobieskiego-poznan-inne,918281305"
doc<-htmlParse(url)
wsp<-xpathApply(doc,"//script[@type='text/javascript']", xmlValue)
geocode<-strsplit(wsp,"google.maps.LatLng\\(")[[1]][3]
geocode<-strsplit(geocode,"\\)")[[1]][1]
Lat<-as.numeric(strsplit(geocode,", ")[[1]][1])
Lng<-as.numeric(strsplit(geocode,", ")[[1]][2])

你认为有更好的方法吗?我没有找到任何使用包XML或Rcurl来抓取html代码的javascript部分的示例。

1 个答案:

答案 0 :(得分:0)

您可以使用正则表达式来抓取坐标。

url<-"http://www.oferty.net/mieszkanie-na-sprzedaz-os-jana-iii-sobieskiego-poznan-inne,918281305" 
page <- getURL(url)
pos <- regexpr("LatLng\\((\\d{2}\\.\\d{4}), (\\d{2}\\.\\d{4})\\)", page, perl=TRUE)
LatLng <- unlist(lapply(attr(pos,"capture.start"),
                 function(x,y) as.numeric(substr(y,x,x+6)),y=page))
相关问题