I'd like to use lxml
to convert the thing before <br>
of an element to string. Supposed the following p
element is retrieved by xpath()
, could anybody show me the comment to convert the thing before <br>
to text (xxx yyy
in this case)?
<p><span><strong>xxx</strong></span> <strong>yyy</strong><br> <span><img alt="" class="content-image content-image-right" src="yyy.jpg"></span>zzz</p>
答案 0 :(得分:0)
遍历p
元素的子元素,并收集每个元素的递归文本内容,直到到达br
元素为止。
from lxml import etree
p = etree.fromstring("""<p><span><strong>xxx</strong></span> <strong>yyy</strong><br/> <span><img alt="" class="content-image content-image-right" src="yyy.jpg"/></span>zzz</p> """)
text = []
for child in p.getchildren():
text.append(etree.tostring(child, method="text"))
if child.tag == "br":
break
print(b"".join(text))
输出:
b'xxx yyy '