Question

代码

    <div id="content">
        <div class="sample">sample text</div>
        <div class="datebar">
           <span style="float:right">some text1</span>
           <b>some text2</b>
        </div>
     <p>paragraph 1</p>
     <p>paragraph 2</p> 
   </div>

我想获取<p>代码中的数据，或者您可以说<div class="datebar">之后的数据。

Answer 1

//div[@id="content"]/p/text()

使用您提供的样本可以实现您的要求。

<强>更新
如果您只想要<p>之后的那些<div class="datebar">。以下应该有效：

//div[@id = 'content']/p[preceding-sibling::div[@class='datebar']]/text()

另一次更新 - 对于Kirill

这是一个HTML示例，它在<p>之前有一个额外的<div class="datebar">，并且使用python测试了xpath表达式。

显然，解决方案取决于完整输入HTML的内容以及OP想要提取的内容，目前这两者都不清楚。

>>> from lxml import etree
>>> doc = etree.HTML("""
... <div id="content">
...   <div class="sample">sample text</div>
...   <p>paragraph 1</p>
...   <div class="datebar">
...     <span style="float:right">some text1</span>
...     <b>some text2</b>
...   </div>
...   <p>paragraph 2</p>
...   <p>paragraph 3</p>
... </div>""")
>>> # My first suggestion
... doc.xpath("//div[@id='content']/p/text()")
['paragraph 1', 'paragraph 2', 'paragraph 3']
>>> # Kirill's solution
... doc.xpath("//div[@id = 'content' and div[@class = 'datebar']]/p/text()")
['paragraph 1', 'paragraph 2', 'paragraph 3']
>>> # My response to Kirill
... doc.xpath("//div[@id = 'content']/p[preceding-sibling::div[@class='datebar']]/text()")
['paragraph 2', 'paragraph 3']

Kirill的//div[@id = 'content' and div[@class = 'datebar']]/p/text()表达式未选择

只有p父div @id = 'content' div @class = 'datebar'前{{1}}
的{{1}}

如他的评论所述。

Answer 2

//div[@id = 'content' and div[@class = 'datebar']]/p/text()

使用xpath从没有类的div中提取数据

2 个答案: