How convert the things before <br/> to text?

时间:2018-07-25 05:24:49

标签: xpath lxml

I'd like to use lxml to convert the thing before <br> of an element to string. Supposed the following p element is retrieved by xpath(), could anybody show me the comment to convert the thing before <br> to text (xxx yyy in this case)?

<p><span><strong>xxx</strong></span> <strong>yyy</strong><br> <span><img alt="" class="content-image content-image-right" src="yyy.jpg"></span>zzz</p> 

1 个答案:

答案 0 :(得分:0)

遍历p元素的子元素,并收集每个元素的递归文本内容,直到到达br元素为止。

from lxml import etree

p = etree.fromstring("""<p><span><strong>xxx</strong></span> <strong>yyy</strong><br/> <span><img alt="" class="content-image content-image-right" src="yyy.jpg"/></span>zzz</p> """)

text = []
for child in p.getchildren():
    text.append(etree.tostring(child, method="text"))
    if child.tag == "br":
        break

print(b"".join(text))

输出:

b'xxx yyy '