我相信页面的标记是我遇到的问题的一部分,所以我想我需要发布源代码和JSFiddle JSFiddle和原始的 GIS page
我想获取名称和地址等信息: 从底部的表格。
尝试解决方案:
我写了下面的代码,希望能看到所有的表数据,但是我想要获取数据的表什么都不返回。
<?php
$k=0;
$num=1000;
var_dump(libxml_use_internal_errors(true));
$domOb = new DOMDocument();
$html = @$domOb->loadHTMLFile('http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=query&key=372215634301&type=P');
$domOb->preserveWhiteSpace = false;
$items = $domOb->getElementsByTagName('td');
while ($k<(int)$num){
echo $items->item($k++)->nodeValue.'<br>';
};
?>
所有返回的内容是:
布尔(假) 房地产搜索 - 遗产 地图图层 可见 常见问题解答 救命 GIS主页
所以我希望有人可以告诉我错过了我正在寻找的所有数据我做错了什么? 如何尽可能轻松/简单地提取姓名和地址?
使用Xpath尝试了以下内容,但是收到了很多警告......
$dom = new DOMDocument;
$dom->load('http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=query&key=372215634301&type=P');
$s = simplexml_import_dom($dom);
echo $name = $s->xpath('//table[@class="words13]/td[contains(text(), "Name:")]');
echo $add = $s->xpath('//table[@class="words13]/td[contains(text(), Address:)]');
使用user2518542的代码并结合hakre代码,我得到以下内容
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=QUERY&key=372215634301&type=P&width=1280&height=923");
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout after 30 seconds
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result=curl_exec ($ch);
curl_close ($ch);
$doc->loadHTML($result);
$tds = $doc->getElementsByTagname('td');
foreach($tds as $td) {
printf(" * %s\n", $td->textContent);
echo '<br>';
}
以下成功打印出所有标签。
答案 0 :(得分:2)
您要查找的表格单元格不是该HTML文档的一部分。您首先需要了解网页编写的基础知识,我建议您借阅一些关于该主题的书籍并阅读它们。
图书馆的时间;)
如果表格单元格在文档中(它似乎有所不同,有时它们是,有时它们不是),原始示例显示它,这也演示了如何迭代 DOMNodeList :
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTMLFile('Catawba County Legacy Map Server.html');
$tds = $doc->getElementsByTagname('td');
foreach($tds as $td) {
printf(" * %s\n", $td->textContent);
}
示例性输出:
php "test.php" (in directory: /home/hakre/php/test)
*
* Real Estate Search - Legacy
*
*
*
*
*
*
*
*
*
* Map Layers
* visible
*
*
* Parcels
*
* Parcel Annotation
*
* Address Points
*
* Misc. Lines
*
* Structures
*
* Contour Lines
*
* Soils
*
* Townships
*
* Water Features
*
* Tiles
*
* Flood Zone
*
* Agricultural District
*
* Aerial 2009
*
* Aerial 2005
*
* Aerial 2002
*
* Cities
*
* Print the Map
* Print Map and Parcel Report
* Print the Parcel Report
* Assessment Report
* List all Owners
* Deed History Report
* Parcel Information:
* Owner Information:
* Parcel ID: 372215634301
* Name: PENLEY TREASURE B
* Parcel Address: 3152 7TH AV SE
* Name2:
* City: CONOVER 28613
* Address: 5508 SWINGING BRIDGE RD
* LRK(REID): 57186
* Address2:
* Deed Book/Page: 1906/0741 Deed Image
* City: CONOVER
* Subdivision: FOREST HGTS
* State/Zip: NC 28613-7415
* Lots: 1-4
*
* Block: C
*
* Last Sale:
* School Information:
* Plat Book/Page: 8/119 Plat Image
* School District: COUNTY
* Calculated Acreage: 0.31
* Elementary School: WEBB A MURRAY
* Tax Map: 167H 04006A
* Middle School: ARNDT
* State Road:
* High School: ST STEPHENS
* Township: HICKORY
* School Map
*
*
* Tax/Value Information: Tax Rates(pdf)
* Zoning Information:
* Municipal Tax District:
* Zoning District: HICKORY
* Fire District: HICKORY RURAL
* Zoning1: OI
* Tax Account Number:
* Zoning2:
* Market Building(s) Value: $55,400
* Zoning3:
* Market Land Value: $20,300
* Zoning Overlay:
* Market Total Value: $75,700
* Small Area:
* Year Built/Remodeled: 1959
* Split Zoning District 1/2: 0/0
* Current Tax Bill
* Zoning Agency Phone Numbers
* Miscellaneous:
*
* Voter Precinct:P35
* Firm Panel Date: 9/5/2007
* Building Permits for this parcel
* Firm Panel #: 3710372200J
* WaterShed:
* 2010 Census Tract: 011000
* WaterShed Split:
* 2010 Census Block: 3035
* Parcel Report Data Descriptions
* Agricultural District:
* FAQ's
* Help
* GIS Home
Compilation finished successfully.
答案 1 :(得分:1)
使用XPath查找//table[@class="words13]/td[contains(text(), 'Name:')]
和//table[@class="words13]/td[contains(text(), 'Address:')]
答案 2 :(得分:1)
试试这个
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp? Cmd=QUERY&key=372215634301&type=P&width=1280&height=923");
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout after 30 seconds
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result=curl_exec ($ch);
curl_close ($ch);
echo $result;exit;
您将获得完整的页面来源,然后您可以通过pregreplace获得您想要的watever。