按顺序解析Dom元素

时间:2012-02-02 19:47:28

标签: php parsing xpath domdocument

我需要解析以下代码

<ul class="zg_hrsr">
<li class="zg_hrsr_item">
<span class="zg_hrsr_rank">#15</span>
<span class="zg_hrsr_ladder">
in 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/ref=pd_zg_hrsr_kstore_1_1">Kindle Store</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/154606011/ref=pd_zg_hrsr_kstore_1_2">Kindle eBooks</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/157325011/ref=pd_zg_hrsr_kstore_1_3">Nonfiction</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/292975011/ref=pd_zg_hrsr_kstore_1_4">Lifestyle & Home</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/156699011/ref=pd_zg_hrsr_kstore_1_5">Home & Garden</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/156828011/ref=pd_zg_hrsr_kstore_1_6">Gardening & Horticulture</a>
 > 
<b>
<a href="http://www.amazon.com/gp/bestsellers/digital-text/156847011/ref=pd_zg_hrsr_kstore_1_7_last">Greenhouses</a>
</b>
</span>
</li>
<li class="zg_hrsr_item">
<span class="zg_hrsr_rank">#26</span>
<span class="zg_hrsr_ladder">
in 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/ref=pd_zg_hrsr_kstore_2_1">Kindle Store</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/154606011/ref=pd_zg_hrsr_kstore_2_2">Kindle eBooks</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/157325011/ref=pd_zg_hrsr_kstore_2_3">Nonfiction</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/292975011/ref=pd_zg_hrsr_kstore_2_4">Lifestyle & Home</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/156699011/ref=pd_zg_hrsr_kstore_2_5">Home & Garden</a>
 > 
<a href="http://www.amazon.com/gp/bestsellers/digital-text/156828011/ref=pd_zg_hrsr_kstore_2_6">Gardening & Horticulture</a>
 > 
<b>
<a href="http://www.amazon.com/gp/bestsellers/digital-text/156849011/ref=pd_zg_hrsr_kstore_2_7_last">House Plants</a>
</b>
</span>
</li>
</ul>

,我想要的输出是,

  

  • 卖家排名:#266,715在Kindle商店支付(参见前100名付费)   Kindle商店)   
  • Kindle商店中的#15&gt; Kindle电子书&gt; <非虚构类>生活方式与主页&gt;家庭&amp;花园&gt;园艺和园艺&gt;温室   
  • Kindle商店中的#26&gt; Kindle电子书&gt; <非虚构类>生活方式与主页&gt;家庭&amp;花园&gt;园艺和园艺&gt;室内植物

  • 我怎样才能做到这一点?我所知道的是,我应该为每个'a'标签获取'nodeValue',但我很困惑,以我所需的格式获取它们, 我想我应该使用数组,但我无法实现它,因为我的专业水平很低..

    指南和帮助请。我只需要xPath和数组的结构(如果可以使用数组完成)或者替换数组..

    1 个答案:

    答案 0 :(得分:0)

    //create XPath from you DOM object:
    $xpath = new DOMXPath($dom);
    foreach($xpath->query("//span[@class='zg_hrsr_rank']") as $rank){
        $rank = $rank->textContent;
        $trail = array();
        foreach($xpath->query('//a',$rank) as $step){
            $trail[] = $step->textContent;
        }
        echo $rank.' '.implode(' > ',$trail)."\n";
    }