我正在使用bs4解析html文件。我想检查给定节点是否在另一个给定节点之后。两者都在两个不同的find_all查询返回的列表中。它们不一定是发现的第一个/最后一次出现,也不一定在树中相同的深度上,即深度的顺序未知。
我尝试使用以下方法进行检查:
if el1 in el2.next_elements:
但是似乎in .next_elements
和in .previous_elements
是非排他性的,我不理解。另外,el in el.next_elements
有时会返回True
。
例如:
<tag1> This is some node </tag2>
<tag2> This node was extracted by first find_all </tag2>
<tag2> This node was also extracted by first find_all </tag2>
<tag3>
<tag4> This node was extracted by second find_all </tag4>
</tag3>
<tag5>
<tag2> This node was extracted by first find_all </tag2>
</tag5>
<tag3> This node was extracted by second find_all </tag3>
我的全部任务是以最大的不重叠间隔提取文本,该间隔从第一个列表的元素开始,到第二个列表的元素结束,即在这里,结果将是:
['''This node was extracted by first find_all
This node was also extracted by first find_all
This node was extracted by second find_all''',
'''This node was extracted by first find_all
This node was extracted by second find_all''']