Xpath - 如何获取div的内容但排除内部div

时间:2014-05-27 04:06:59

标签: xpath

我想抓取所有文字之后调用的div:<p class="meta"></p>所以我基本上想要包含在这个div中的内容。在我想要之后的所有其他事情,所以关闭p标签后的所有内容。

以下是完整代码:

<div id="post">
    <p class="meta"> <img src='http://images.test.com/bin/famfamfam_flags/png/gb.png' border="0" align="absmiddle" alt='Flag of United Kingdom' onerror="this.onerror=null; this.src='/bin/famfamfam_silk/gifs/flag_blue.gif'; return false;"/>
        &nbsp;
        <a href="http://www.test-page.html" rel="tag" class="location">New Zealand</a>,
        <a href="http://www.test.com/s/England" title="title text">a link</a>,
        <a href="http://www.test.com/test-page.html" rel="tag">NZ</a>
        <br/>
        <span class="date">Sunday, November 25, 2012</span>
        <br/>
        <iframe class="like_frame" scrolling="no" frameborder="0" style="border:none ;overflow:hidden; width:327px; padding-top:14px; height:24px;" allowTransparency="true"></iframe>
    </p>
    Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text  Lorum ipsum text.
</div>

1 个答案:

答案 0 :(得分:1)

您可以使用following-sibling

//p[@class="meta"]/following-sibling::node()

演示(使用xmllint):

$ xmllint index.html --xpath '//p[@class="meta"]/following-sibling::node()'
Lorum ipsum text Lorum ipsum text Lorum ipsum text Lorum ipsum text Lorum ipsum text Lorum ipsum text Lorum ipsum
text Lorum ipsum text Lorum ipsum text Lorum ipsum text Lorum ipsum text Lorum ipsum text Lorum ipsum text.