我在解析结果页面的页面源时遇到了很多麻烦。结果页面返回有关城市中企业的数据。此数据包括姓名,地址,电话号码,所有者名称和URL。任何帮助将非常感激。
这是其中一个结果的示例(原始文件中有数百个):
<div class="ListingResults_All_CONTAINER ListingResults_Level3_CONTAINER">
<div class="ListingResults_Level3_HEADER">
<div class="ListingResults_All_ENTRYTITLERIGHT">
<div><a href="/Restaurants/317-at-Montgomery-7897"><img src="/external/wcpages/images/L3more.gif" alt="317 at Montgomery"></a></div>
</div>
<div class="ListingResults_All_ENTRYTITLELEFT">
<div class="ListingResults_All_ENTRYTITLELEFTBOX"><strong><span itemprop="name"><a href="/Restaurants/317-at-Montgomery-7897">317 at Montgomery</a></span></strong></div>
</div>
</div>
<div class="ListingResults_Level3_MAIN">
<div class="ListingResults_Level3_MAINRIGHT">
<div class="ListingResults_Level3_MAINRIGHTBOX">
<div class="ListingResults_Level3_LOGO"><a href="/Restaurants/317-at-Montgomery-7897" class="ListingResults_Level3_LOGO"><img src="http://www.centerstateceo.com/external/wcpages/wcwebcontent/webcontentpage.aspx?contentid=2071" class="ListingResults_Level3_LOGOIMG"></a><div style="width:100%;height:1px;overflow:hidden;"></div>
</div>
<div class="ListingResults_MAINRIGHTBOXDIVIDER" style="width:100%;overflow:hidden;height:1px;">_</div>
<div class="ListingResults_Level3_AFFILIATIONS"></div>
</div>
</div>
<div class="ListingResults_Level3_MAINLEFT">
<div class="ListingResults_Level3_MAINLEFTBOX" itemtype="http://data-vocabulary.org/Address" itemscope="" itemprop="address"><span itemprop="street-address">317 Montgomery St.</span><br><span itemprop="locality">Syracuse</span>, <span itemprop="region">NY</span> <span itemprop="postal-code">13202 </span><div class="ListingResults_Level3_MAINCONTACT"><a href="/directory/directoryemailform.aspx?listingid=7897"><img src="/external/wcpages/images/maincontact.gif" alt="Mr. Dean Whittles">Mr. Dean Whittles</a></div>
<div class="ListingResults_Level3_PHONE1"><img src="/external/wcpages/images/phone.gif" alt="Work Phone: (315) 214-4267">(315) 214-4267</div>
</div>
</div>
</div>
<div class="ListingResults_Level3_FOOTER">
<div class="ListingResults_Level3_DESCRIPTION">
<div class="ListingResults_Level3_DESCRIPTIONBOX"></div>
</div>
<div class="ListingResults_Level3_FOOTERRIGHT">
<div class="ListingResults_Level3_FOOTERRIGHTBOX">
<div class="ListingResults_Level3_SOCIALMEDIA"></div>
</div>
</div>
<div class="ListingResults_Level3_FOOTERRIGHT">
<div class="ListingResults_Level3_FOOTERRIGHTBOX">
<div class="ListingResults_Level3_COUPONS"></div>
</div>
</div>
<div class="ListingResults_Level3_FOOTERLEFT">
<div class="ListingResults_Level3_FOOTERLEFTBOX"><span class="ListingResults_Level3_LEARNMORE"><a href="/Restaurants/317-at-Montgomery-7897" class="level3_footer_left_box_a friendly">
Learn More
</a></span><span class="ListingResults_Level3_VISITSITE"> | <a href="http://www.317syr.com" onclick="recordReferralOnClick('20947', '7897', 'W');" target="_blank">
Visit Site
</a></span><span class="ListingResults_Level3_MAP"> | <a href="javascript:void(0)" onclick="addItemToMapWithArrayIndexOf('0');recordReferralOnClick('20947', '7897', 'M');" class="level3_footer_left_box_a">Show on Map</a></span></div>
</div>
</div>
</div>
评论中的PHP代码:
<?php
$dom = new DOMDocument();
$dom->loadHtml($data);
$spans = $dom->getElementsByTagName('span');
foreach ($spans as $el) {
$children = $el->childNodes->item(1);
if (is_object($children) AND $children->tagName == 'a') {
$url = $children->getAttribute('href');
echo $url;
continue;
}
$user_param = $el->getAttribute('itemprop');
$value = $el->nodeValue;
if ($user_param != "") {
echo $user_param . " " . $value . "\n";
}
}
?>