如何在这种情况下获取p标签文本“Blahblah”:
当p标签文本字段位于强标记后面时,lxml无法识别它。
<p class="user_p"><strong>cc</strong>Blahblah</p>
====代码====
from lxml import html
content="""
<div>
<p class="user_p">Blahblah<strong>cc</strong></p>
<p class="user_p"><strong>cc</strong>Blahblah</p>
</div>
"""
tree = html.fromstring(content.decode('utf-8'))
p = tree.xpath('//div/p')
print p[0].text
print p[1].text
====输出====
Blahblah
None
答案 0 :(得分:1)
在此HTML片段中
<p class="user_p"><strong>cc</strong>Blahblah</p>
文本&#34; Blahblah&#34;是<strong>
元素的tail
属性的值。
演示代码:
from lxml import html
content = """
<div>
<p class="user_p"><strong>cc</strong>Blahblah</p>
</div>"""
tree = html.fromstring(content)
s = tree.xpath('//div/p/strong')
print s[0].tail
输出:
Blahblah