Question

我试图解析一个非常广泛的HTML文档，看起来像是：

<br>
<div class="reportsubsection n" ><br>
    <h2> 1.4 Test </h2><br>
    <p> insert text here </p ><br>
    <table> crazy table thing here < /table ><br>
</div>
<div class="reportsubsection n"><br>
    <h2> 1.4 Finding < /h2 ><br>
    <p> insert text here < /p ><br>
    <table> crazy table thing here < /table><br>
</div>

我需要根据div解析第二个h2文本＆＃34;查找＆＃34;。我能够通过以下方式打开所有div标签

divTag = soup.find("div", {"id": "reportsubsection"})

但不知道如何从那里减少它。根据我发现的其他帖子，我能够找到特定文本＆＃34;第2部分，但我需要能够打印其中包含的整个div部分。基本上，如果div有class=reportsubsection而h2有单词＆＃34;正在发现＆＃34;在其中，打印整个＆＃39; div`。

我根据我发现的其他一些帖子尝试了以下内容，但它没有返回任何结果。

divTag = soup.find("div", {"id": "reportsubsection"})
for reportsubsection in soup.select('div#reportsubsection #reportsubsection'):
if not reportsubsection.findAll('h2', text=re.compile('Finding')):
continue
print divTag

Answer 1

您正在寻找div元素，id设置为＆quot; reportsubsection＆＃39;。但是在您的HTML中，div的{{1}}属性具有值＆＃39; reportsubsection＆＃39;。

试试这个：

class

基于H2的美丽汤DIV解析

1 个答案: