>>> from lxml import html
>>> html.tostring(html.fromstring('<div>1</div><div>2</div>'))
'<div><div>1</div><div>2</div></div>' # I dont want to outer <div>
>>> html.tostring(html.fromstring('I am pure text'))
'<p>I am pure text</p>' # I dont need the extra <p>
如何避免lxml中的外<div>
和<p>
?
答案 0 :(得分:2)
默认情况下,lxml
will create a parent div
when the string contains multiple elements。
您可以使用单个片段:
{% with ignorevar=Counter.increment %}{% endwith %}
输出:
from lxml import html test_cases = ['<div>1</div><div>2</div>', 'I am pure text'] for test_case in test_cases: fragments = html.fragments_fromstring(test_case) print(fragments) output = '' for fragment in fragments: if isinstance(fragment, str): output += fragment else: output += html.tostring(fragment).decode('UTF-8') print(output)