检查节点是否在另一个节点BeautifulSoup之后

时间:2018-07-12 15:04:53

标签: python beautifulsoup

我正在使用bs4解析html文件。我想检查给定节点是否在另一个给定节点之后。两者都在两个不同的find_all查询返回的列表中。它们不一定是发现的第一个/最后一次出现,也不一定在树中相同的深度上,即深度的顺序未知。

我尝试使用以下方法进行检查:

if el1 in el2.next_elements:

但是似乎in .next_elementsin .previous_elements是非排他性的,我不理解。另外,el in el.next_elements有时会返回True

例如:

<tag1> This is some node </tag2>
<tag2> This node was extracted by first find_all </tag2>
<tag2> This node was also extracted by first find_all </tag2>
<tag3>
    <tag4> This node was extracted by second find_all </tag4>
</tag3>
<tag5>
    <tag2> This node was extracted by first find_all </tag2>
</tag5>
<tag3> This node was extracted by second find_all </tag3>

我的全部任务是以最大的不重叠间隔提取文本,该间隔从第一个列表的元素开始,到第二个列表的元素结束,即在这里,结果将是:

['''This node was extracted by first find_all 
This node was also extracted by first find_all 
This node was extracted by second find_all''',
'''This node was extracted by first find_all 
This node was extracted by second find_all''']

0 个答案:

没有答案