lxml xpath无法处理<p>标记</p>

时间:2015-03-19 14:13:16

标签: html lxml

如何在这种情况下获取p标签文本“Blahblah”:

当p标签文本字段位于强标记后面时,lxml无法识别它。

<p class="user_p"><strong>cc</strong>Blahblah</p>

====代码====

from lxml import html
content="""
    <div>
    <p class="user_p">Blahblah<strong>cc</strong></p>
    <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>
"""
tree = html.fromstring(content.decode('utf-8'))

p = tree.xpath('//div/p')

print p[0].text

print p[1].text

====输出====

Blahblah
None

1 个答案:

答案 0 :(得分:1)

在此HTML片段中

<p class="user_p"><strong>cc</strong>Blahblah</p>

文本&#34; Blahblah&#34;是<strong>元素的tail属性的值。

演示代码:

from lxml import html

content = """
    <div>
     <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>"""

tree = html.fromstring(content)
s = tree.xpath('//div/p/strong')
print s[0].tail

输出:

Blahblah