使用简单的HTML Dom来刮取Google SERP

时间:2014-08-21 09:56:24

标签: php html simple-html-dom

我希望使用Simple HTML Dom刮取Google SERP的HTML响应我需要将其拆分为Google广告,Google本地列表和正常的SERPS,html如下所示;

本地SERP

<div style="padding-bottom:8px">
  <div class="vsc vscl" data-extra="ludocid=14796923074808088664&amp;lumarker=A" sig="zoG">
    <div data-ved="0CDkQkgowAA">
      <div data-ved="0CDoQkQowAA"> </div>
    </div>
    <!--m-->
    <div class="g" style="padding-top:2px;line-height:18px">
      <div style="width:318px;float:left">
        <h3 class="r" style="line-height:normal"><a class="l" href="http://www.beaucare.co.uk/" onmousedown="return rwt(this,'','','','1','AFQjCNH2k6BS0xRb2CTmI-lrSbmEXI1F6Q','','0CDsQoAIwAA','','',event)">Beaucare <em>Dry Cleaners</em></a></h3>
        <span><cite class="_Ed">www.beaucare.co.uk</cite></span><br>
        <div style="display:inline-block;margin-right:5px"><span style="margin-right:5px" class="rtng">3.8</span><span class="star star-s"><span style="width:56px"></span></span></div>
        <a href="https://plus.google.com/111531266748464106005/about?hl=en&amp;socfid=web:lu:result:writeareviewplusurl&amp;socpid=1" onmousedown="return rwt(this,'','','','1','AFQjCNFVXSI2PMojhxQq6PAS5yWmYCkMgA','','0CD4Q4gkwAA','','',event)"><span style="white-space:nowrap">6 Google reviews</span></a></div>
      <div style="margin-left:26px;width:22px;float:left"><span style="height:38px;padding:0;width:22px"><a class="l" style="border:none;display:block;overflow:hidden;height:30px;width:16px" href="https://maps.google.co.uk/maps?pws=1&amp;num=100&amp;igu=1&amp;ip=0.0.0.0&amp;safe=images&amp;gl=uk&amp;gll=53.41058,-2.97794&amp;gws_rd=ssl&amp;um=1&amp;ie=UTF-8&amp;q=dry+cleaners+twickenham&amp;fb=1&amp;hq=dry+cleaners&amp;hnear=0x48760c93b240c7c3:0xe4a25f60c77e7ed1,Twickenham,+Greater+London&amp;cid=14796923074808088664&amp;sa=X&amp;ei=4731U6_DFurV0QWM1IHoBw&amp;ved=0CD8QrwswAA"><span class="lumi0 lupin" style="display:block;background:url(/images/mappins_grey.png) no-repeat;background-position:0 -35px;background-size:;height:30px;width:16px"></span></a></span></div>
      <div style="width:146px;float:left;color:#808080;line-height:18px"><span>146 Heath Rd</span><br>
        <span>Twickenham</span><br>
        <nobr><span>020 8891 5797</span></nobr></div>
      <!--n--></div>
  </div>
</div>

支付SERP

<li class="ads-ad" data-hveid="34">
  <h3><a style="display:none" href="http://www.google.co.uk/aclk?sa=L&amp;ai=C-QrY4731U4TTGYm4jAb3q4HwCsWV_qMF9can5boBtI6yLggAEAEoAlD06tiLAWC7vq6D0ArIAQGpAvwJat3my7s-qgQmT9CoSc8LXNEiEfFMf0izXjjIVgr6InoeWMZFZsdEobDCsi4h-PKAB72KqSaQBwGoB6a-Gw&amp;sig=AOD64_2ShDv4EEWhKvQJU3p6FF4V1mqfyg&amp;rct=j&amp;q=&amp;ved=0CCMQ0Qw&amp;adurl=http://ducanerichmond.co.uk/" id="s0p1"></a><a href="http://www.google.co.uk/aclk?sa=L&amp;ai=C-QrY4731U4TTGYm4jAb3q4HwCsWV_qMF9can5boBtI6yLggAEAEoAlD06tiLAWC7vq6D0ArIAQGpAvwJat3my7s-qgQmT9CoSc8LXNEiEfFMf0izXjjIVgr6InoeWMZFZsdEobDCsi4h-PKAB72KqSaQBwGoB6a-Gw&amp;sig=AOD64_2ShDv4EEWhKvQJU3p6FF4V1mqfyg&amp;rct=j&amp;q=&amp;ved=0CCMQ0Qw&amp;adurl=http://ducanerichmond.co.uk/" id="vs0p1" onmousedown="return google.arwt(this)" jsl="$x 1;$t t-zxXzjt1d4B0;$x 0;" class="r-taw5"><b>Dry Cleaning</b> Services - We <b>Dry Clean</b> all types of material‎</a></h3>
  <div class="ads-visurl"><span class="ads-badge">Ad</span><cite>www.ducanerichmond.co.uk/</cite>‎
    <div class="action-menu ab_ctl"><a class="_Su ab_button" href="#" id="am-b-1398152331" aria-label="Result details" aria-expanded="false" aria-haspopup="true" role="button" jsaction="ab.tdd;keydown:ab.hbke;keypress:ab.mskpe" data-ved="0CCQQ7B0"><span class="mn-dwn-arw"></span></a>
      <div class="action-menu-panel ab_dropdown" role="menu" tabindex="-1" jsaction="keydown:ab.hdke;mouseover:ab.hdhne;mouseout:ab.hdhue" data-ved="0CCUQqR8">
        <ul>
          <li class="action-menu-item ab_dropdownitem" role="menuitem" data-type="why_this_ad">
            <div class="action-menu-button" role="menuitem" tabindex="-1" jsaction="am.itemclk" data-ved="0CCYQgRM">Why this ad?</div>
          </li>
        </ul>
      </div>
    </div>
    <span class="_ME">020 8332 1111</span></div>
  <div class="ads-creative">Leather,Suede,Fur,Silk,&amp; Upholstery</div>
  <div class="_Fbb">
    <div class="_WE"><a href="http://www.google.co.uk/aclk?sa=L&amp;ai=CydP04731U4TTGYm4jAb3q4HwCsWV_qMF9can5boBtI6yLggAEAEoAlDjg__A_f____8BYLu-roPQCsgBAakC_Alq3ebLuz6qBCZP0KhJzwtc0SIR8Ux_SLNeOMhWCvoieh5YxkVmx0ShsMKyLiH48tIGDRC1l6QJGMfCjP4CKAeAB72KqSaQBwGoB6a-Gw&amp;sig=AOD64_21W1rEGeEJ1-HDUmzak5fcBxQWFw&amp;ctype=50&amp;rct=j&amp;q=&amp;ved=0CCkQwSk&amp;adurl=https://maps.google.co.uk/maps%3Fpws%3D1%26num%3D100%26igu%3D1%26ip%3D0.0.0.0%26safe%3Dimages%26gl%3DGB%26gll%3D53.41058,-2.97794%26gws_rd%3Dssl%26um%3D1%26ie%3DUTF-8%26daddr%3DWestminster%2BHouse,%2BKew%2BRoad,%2BRichmond,%2BSurrey%2BTW9%2B2ND%26ll%3D51.463800,-0.301657%26f%3Dd%26saddr%3D%26iwstate1%3Ddir:to%26fb%3D1%26slad%3D0ALHuxZqOuRfNxBiKpPTHEBPC-dxvZ8xRdgChl3d3cuZHVjYW5lcmljaG1vbmQuY28udWsvEhxodHRwOi8vZHVjYW5lcmljaG1vbmQuY28udWsvGjpEcnkgQ2xlYW5pbmcgU2VydmljZXMgLSBXZSBEcnkgQ2xlYW4gYWxsIHR5cGVzIG9mIG1hdGVyaWFsIiNMZWF0aGVyLFN1ZWRlLEZ1cixTaWxrLCYgVXBob2xzdGVyeSoA%26geocode%3D13444415340063753198,51463800,-301657" class="_XE"><span class="_YE"></span></a></div>
    <div class="_WE"><a href="http://www.google.co.uk/aclk?sa=L&amp;ai=CydP04731U4TTGYm4jAb3q4HwCsWV_qMF9can5boBtI6yLggAEAEoAlDjg__A_f____8BYLu-roPQCsgBAakC_Alq3ebLuz6qBCZP0KhJzwtc0SIR8Ux_SLNeOMhWCvoieh5YxkVmx0ShsMKyLiH48tIGDRC1l6QJGMfCjP4CKAeAB72KqSaQBwGoB6a-Gw&amp;sig=AOD64_21W1rEGeEJ1-HDUmzak5fcBxQWFw&amp;ctype=50&amp;rct=j&amp;q=&amp;ved=0CCoQmxA&amp;adurl=https://maps.google.co.uk/maps%3Fpws%3D1%26num%3D100%26igu%3D1%26ip%3D0.0.0.0%26safe%3Dimages%26gl%3DGB%26gll%3D53.41058,-2.97794%26gws_rd%3Dssl%26um%3D1%26ie%3DUTF-8%26daddr%3DWestminster%2BHouse,%2BKew%2BRoad,%2BRichmond,%2BSurrey%2BTW9%2B2ND%26ll%3D51.463800,-0.301657%26f%3Dd%26saddr%3D%26iwstate1%3Ddir:to%26fb%3D1%26slad%3D0ALHuxZqOuRfNxBiKpPTHEBPC-dxvZ8xRdgChl3d3cuZHVjYW5lcmljaG1vbmQuY28udWsvEhxodHRwOi8vZHVjYW5lcmljaG1vbmQuY28udWsvGjpEcnkgQ2xlYW5pbmcgU2VydmljZXMgLSBXZSBEcnkgQ2xlYW4gYWxsIHR5cGVzIG9mIG1hdGVyaWFsIiNMZWF0aGVyLFN1ZWRlLEZ1cixTaWxrLCYgVXBob2xzdGVyeSoA%26geocode%3D13444415340063753198,51463800,-301657" class="_Ebb">Westminster House, Kew Road, Richmond, Surrey</a>‎</div>
  </div>
</li>

标准SERP

<li class="g"><!--m-->
  <div class="rc" data-hveid="87">
    <h3 class="r"><a href="http://www.yell.com/biz/kings-dry-cleaners-twickenham-4477896/" onmousedown="return rwt(this,'','','','5','AFQjCNGrECSzVs1i89Tupc1OMeRq_tZfjw','','0CFgQFjAE','','',event)">Kings <em>Dry Cleaners</em>, <em>Twickenham</em> | Dry Cleaners - Yell</a></h3>
    <div class="s">
      <div>
        <div class="f kv _UD" style="white-space:nowrap"><cite class="_Ed">www.yell.com/biz/kings-<b>dry</b>-<b>cleaners</b>-<b>twickenham</b>-4477896/</cite>
          <div class="action-menu ab_ctl"><a class="_Su ab_button" href="#" id="am-b4" aria-label="Result details" aria-expanded="false" aria-haspopup="true" role="button" jsaction="ab.tdd;keydown:ab.hbke;keypress:ab.mskpe" data-ved="0CFkQ7B0wBA"><span class="mn-dwn-arw"></span></a>
            <div class="action-menu-panel ab_dropdown" role="menu" tabindex="-1" jsaction="keydown:ab.hdke;mouseover:ab.hdhne;mouseout:ab.hdhue" data-ved="0CFoQqR8wBA">
              <ul>
                <li class="action-menu-item ab_dropdownitem" role="menuitem"><a class="fl" href="http://webcache.googleusercontent.com/search?q=cache:fKpsx3qZnjcJ:www.yell.com/biz/kings-dry-cleaners-twickenham-4477896/+&amp;cd=5&amp;hl=en&amp;ct=clnk&amp;gl=uk" onmousedown="return rwt(this,'','','','5','AFQjCNF_iMCDBEJfF9L_3mW57Z3Tqp0-xg','','0CFsQIDAE','','',event)">Cached</a></li>
                <li class="action-menu-item ab_dropdownitem" role="menuitem"><a class="fl" href="/search?pws=1&amp;igu=1&amp;gl=GB&amp;gll=53.41058,-2.97794&amp;near=liverpool&amp;q=related:www.yell.com/biz/kings-dry-cleaners-twickenham-4477896/+dry+cleaners+twickenham&amp;tbo=1&amp;sa=X&amp;ei=4731U6_DFurV0QWM1IHoBw&amp;ved=0CFwQHzAE">Similar</a></li>
              </ul>
            </div>
          </div>
        </div>
        <div class="f slp"><span class="csb" style="display:inline-block;position:relative;top:1px;background:url(/images/nav_logo195.png) no-repeat -100px -260px;height:13px;width:65px"><span class="csb" style="background:url(/images/nav_logo195.png) no-repeat -100px -275px;height:13px;width:39px"></span></span> Rating: 3 - ‎2 votes</div>
        <span class="st"><span class="f">22 Jul 2014 - </span>Find Kings <em>Dry Cleaners</em> in <em>Twickenham</em> on Yell. Get reviews, opening hours and directions .</span></div>
    </div>
  </div>
  <!--n--></li>

我尝试使用以下代码

foreach($html->find('li .g') as $e) {
  echo $e->innertext . '<br><br>';  
}

但是这没有显示任何结果,我也搜索了 - &gt; find(&#39; h3 .r&#39;)但我仍然没有结果。

0 个答案:

没有答案