Question

如何在这种情况下获取p标签文本“Blahblah”：

当p标签文本字段位于强标记后面时，lxml无法识别它。

<p class="user_p"><strong>cc</strong>Blahblah</p>

====代码====

from lxml import html
content="""
    <div>
    <p class="user_p">Blahblah<strong>cc</strong></p>
    <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>
"""
tree = html.fromstring(content.decode('utf-8'))

p = tree.xpath('//div/p')

print p[0].text

print p[1].text

====输出====

Blahblah
None

Answer 1

在此HTML片段中

<p class="user_p"><strong>cc</strong>Blahblah</p>

文本＆＃34; Blahblah＆＃34;是<strong>元素的tail属性的值。

演示代码：

from lxml import html

content = """
    <div>
     <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>"""

tree = html.fromstring(content)
s = tree.xpath('//div/p/strong')
print s[0].tail

输出：

Blahblah

lxml xpath无法处理<p>标记</p>

1 个答案: