使用Xpath进行纬度和经度刮擦

时间:2012-03-25 15:29:06

标签: xpath screen-scraping latitude-longitude

我正在尝试网络数据挖掘。我正在使用Xpath试图从浏览器上的地图中刮取图像和纬度和经度。图像在嵌套表中,这一系列图像的路径给我一个问题。纬度和经度也在脚本中我需要提取有没有人知道如何去做。

这是我的图片代码

    <?php
    $baseUrl='http://www.hassconsult.co.ke/';
    $dom = new DOMDocument();
    @$dom->loadHTMLFile('http://www.hassconsult.co.ke/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&google=1');
    $xpath = new DOMXPath($dom);

    foreach($xpath->query("//div[@id='mainbody']/table") as $table) {
    echo $xpath->query(".//img", $table)->item(0)->getAttribute('src'). "\n";
    echo $xpath->query(".//tr[3]/td/a/img", $table)->item(0)->getAttribute('src'). "\n";;
    }

    ?>

here is the table outline as shown using firebug for the images

    <div id="mainbody">
<table width="502" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<tr>
<tr>
<td valign="top" height="69">
<a href="/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&photo_no=1">
<img width="50" height="45" border="0" style="border:1px #993300 solid;" alt="logo" src="/images/photos/p569_1.jpg">
</a>
<a href="/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&photo_no=2">
<img width="50" height="45" border="0" style="border:1px #993300 solid;" alt="logo" src="/images/photos/p569_2.jpg">
</a>
<a href="/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&photo_no=3">
<img width="50" height="45" border="0" style="border:1px #993300 solid;" alt="logo" src="/images/photos/p569_3.jpg">
</a>
<a href="/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&photo_no=4">
<img width="50" height="45" border="0" style="border:1px #993300 solid;" alt="logo" src="/images/photos/p569_4.jpg">
</a>
<a href="/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&photo_no=5">
<img width="50" height="45" border="0" style="border:1px #993300 solid;" alt="logo" src="/images/photos/p569_5.jpg">
</a>
<a href="/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&photo_no=6">
<img width="50" height="45" border="0" style="border:1px #993300 solid;" alt="logo" src="/images/photos/p569_6.jpg">
</a>
<a href="/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&photo_no=7">
<img width="50" height="45" border="0" style="border:1px #993300 solid;" alt="logo" src="/images/photos/p569_7.jpg">
</a>
<div style="font-family:Arial, Helvetica, sans-serif;font-size:12px;font-weight:bold;color:#FF9900;padding-top:10px;"></div>
<strong>
<br>
<div style="font-family:Arial, Helvetica, sans-serif;font-size:12px;color:#000000;text-align:justify;">
</td>
</tr>
<tr>
<tr>
</tbody>
</table>
</div>
</div>

这是显示firebug所示地理点的功能

    <tr>
<td>
<br>
<br>
<div id="map_canvas" style="width: 450px; height: 450px; position: relative; background-color: rgb(229, 227, 223);">
<script>
function initialize() {
if (GBrowserIsCompatible()) {
var map = new GMap2(document.getElementById("map_canvas"));
var point = new GLatLng(-1.242831, 36.777805);//how will i pick these values and echo
map.setCenter(point, 16);
map.addControl(new GLargeMapControl());
map.addControl(new GMapTypeControl());
map.addControl(new GScaleControl());
map.setUIToDefault();
map.setMapType(G_NORMAL_MAP);
var marker = new GMarker(point);
map.addOverlay(marker);
GEvent.addListener(marker, "click", function() {marker.openInfoWindowHtml('<h3>Hillview<br /><a href="http://maps.google.com/maps?saddr=&daddr=' + point.toUrlValue() + '" target ="_blank">Get Directions<\/a>');});
}
}
window.onload = initialize;
window.onunload = GUnload;
</script>
</td>
</tr>

1 个答案:

答案 0 :(得分:0)

您可以使用xpath来获取脚本内容,但您需要(理想情况下是js解析器)或正则表达式来获取这些值:

$src = $xpath->query("//script")->item(0)->nodeValue;
preg_match('/GLatLng\(([\d.-]+), ([\d.-]+)\)/', $src, $m);
list(, $lat, $lng) = $m;