我想获取每个房地产的坐标(webpage),我可以从这部分网页代码中获取:
<script type='text/javascript'>
//<![CDATA[
var GMap_1 = null;
// Call this function when the page has been loaded
function GMap_initialize_1()
{
var mapOptions = {
scrollwheel: 0,
center: new google.maps.LatLng(52.3824, 16.8798),
zoom: 17,
mapTypeId: google.maps.MapTypeId.ROADMAP,
mapTypeControl: true,
mapTypeControlOptions: {style: google.maps.MapTypeControlStyle.DROPDOWN_MENU}
};
GMap_1 = new google.maps.Map(document.getElementById("GMap_map_1"), mapOptions);
iconPrecision = new google.maps.MarkerImage("/images/precisionIcon2.png"); iconPrecision.iconSize = new google.maps.Size(12,20);iconPrecision.iconAnchor = new google.maps.Point(6,20);iconPrecision.infoWindowAnchor = new google.maps.Point(6,3);iconPrecision.shadow = "";iconPrecision.shadowSize = new google.maps.Size(22,20);
marker = new google.maps.Marker({
icon: iconPrecision,
position: new google.maps.LatLng(52.3824, 16.8798),
map: GMap_1
});
}
GMap_initialize_1();
//]]>
</script>
我的代码如下所示:
url<-"http://www.oferty.net/mieszkanie-na-sprzedaz-os-jana-iii-sobieskiego-poznan-inne,918281305"
doc<-htmlParse(url)
wsp<-xpathApply(doc,"//script[@type='text/javascript']", xmlValue)
geocode<-strsplit(wsp,"google.maps.LatLng\\(")[[1]][3]
geocode<-strsplit(geocode,"\\)")[[1]][1]
Lat<-as.numeric(strsplit(geocode,", ")[[1]][1])
Lng<-as.numeric(strsplit(geocode,", ")[[1]][2])
你认为有更好的方法吗?我没有找到任何使用包XML或Rcurl来抓取html代码的javascript部分的示例。
答案 0 :(得分:0)
您可以使用正则表达式来抓取坐标。
url<-"http://www.oferty.net/mieszkanie-na-sprzedaz-os-jana-iii-sobieskiego-poznan-inne,918281305"
page <- getURL(url)
pos <- regexpr("LatLng\\((\\d{2}\\.\\d{4}), (\\d{2}\\.\\d{4})\\)", page, perl=TRUE)
LatLng <- unlist(lapply(attr(pos,"capture.start"),
function(x,y) as.numeric(substr(y,x,x+6)),y=page))