使用PHP解析页面源代码

时间:2014-07-17 13:56:33

标签: php html html-parsing domdocument text-parsing

我在解析结果页面的页面源时遇到了很多麻烦。结果页面返回有关城市中企业的数据。此数据包括姓名,地址,电话号码,所有者名称和URL。任何帮助将非常感激。

这是其中一个结果的示例(原始文件中有数百个):

<div class="ListingResults_All_CONTAINER ListingResults_Level3_CONTAINER">
  <div class="ListingResults_Level3_HEADER">
    <div class="ListingResults_All_ENTRYTITLERIGHT">
      <div><a href="/Restaurants/317-at-Montgomery-7897"><img src="/external/wcpages/images/L3more.gif" alt="317 at Montgomery"></a></div>
    </div>
    <div class="ListingResults_All_ENTRYTITLELEFT">
      <div class="ListingResults_All_ENTRYTITLELEFTBOX"><strong><span itemprop="name"><a href="/Restaurants/317-at-Montgomery-7897">317 at Montgomery</a></span></strong></div>
    </div>
  </div>
  <div class="ListingResults_Level3_MAIN">
    <div class="ListingResults_Level3_MAINRIGHT">
      <div class="ListingResults_Level3_MAINRIGHTBOX">
        <div class="ListingResults_Level3_LOGO"><a href="/Restaurants/317-at-Montgomery-7897" class="ListingResults_Level3_LOGO"><img src="http://www.centerstateceo.com/external/wcpages/wcwebcontent/webcontentpage.aspx?contentid=2071" class="ListingResults_Level3_LOGOIMG"></a><div style="width:100%;height:1px;overflow:hidden;"></div>
        </div>
        <div class="ListingResults_MAINRIGHTBOXDIVIDER" style="width:100%;overflow:hidden;height:1px;">_</div>
        <div class="ListingResults_Level3_AFFILIATIONS"></div>
      </div>
    </div>
    <div class="ListingResults_Level3_MAINLEFT">
      <div class="ListingResults_Level3_MAINLEFTBOX" itemtype="http://data-vocabulary.org/Address" itemscope="" itemprop="address"><span itemprop="street-address">317 Montgomery St.</span><br><span itemprop="locality">Syracuse</span>, <span itemprop="region">NY</span>  <span itemprop="postal-code">13202  </span><div class="ListingResults_Level3_MAINCONTACT"><a href="/directory/directoryemailform.aspx?listingid=7897"><img src="/external/wcpages/images/maincontact.gif" alt="Mr. Dean Whittles">Mr. Dean Whittles</a></div>
        <div class="ListingResults_Level3_PHONE1"><img src="/external/wcpages/images/phone.gif" alt="Work Phone: (315) 214-4267">(315) 214-4267</div>
      </div>
    </div>
  </div>
  <div class="ListingResults_Level3_FOOTER">
    <div class="ListingResults_Level3_DESCRIPTION">
      <div class="ListingResults_Level3_DESCRIPTIONBOX"></div>
    </div>
    <div class="ListingResults_Level3_FOOTERRIGHT">
      <div class="ListingResults_Level3_FOOTERRIGHTBOX">
        <div class="ListingResults_Level3_SOCIALMEDIA"></div>
      </div>
    </div>
    <div class="ListingResults_Level3_FOOTERRIGHT">
      <div class="ListingResults_Level3_FOOTERRIGHTBOX">
        <div class="ListingResults_Level3_COUPONS"></div>
      </div>
    </div>
    <div class="ListingResults_Level3_FOOTERLEFT">
      <div class="ListingResults_Level3_FOOTERLEFTBOX"><span class="ListingResults_Level3_LEARNMORE"><a href="/Restaurants/317-at-Montgomery-7897" class="level3_footer_left_box_a friendly">
                    Learn More
                  </a></span><span class="ListingResults_Level3_VISITSITE"> | <a href="http://www.317syr.com" onclick="recordReferralOnClick('20947', '7897', 'W');" target="_blank">
                    Visit Site
                  </a></span><span class="ListingResults_Level3_MAP"> | <a href="javascript:void(0)" onclick="addItemToMapWithArrayIndexOf('0');recordReferralOnClick('20947', '7897', 'M');" class="level3_footer_left_box_a">Show on Map</a></span></div>
    </div>
  </div>
</div>

评论中的PHP代码:

<?php
$dom = new DOMDocument();
$dom->loadHtml($data);
$spans = $dom->getElementsByTagName('span');
foreach ($spans as $el) {
    $children = $el->childNodes->item(1);
    if (is_object($children) AND $children->tagName == 'a') {
        $url = $children->getAttribute('href');
        echo $url;
        continue;
    }
    $user_param = $el->getAttribute('itemprop');
    $value      = $el->nodeValue;
    if ($user_param != "") {
        echo $user_param . " " . $value . "\n";
    }
}
?>

0 个答案:

没有答案