我正在使用以下'阻止' HTML:
<div class="marketing-directories-results">
<ul>
<li>
<div class="contact-details">
<h2>
A I I Insurance Brokerage of Massachusetts Inc
</h2>
<br/>
<address>
183 Davis St
<br/>
East Douglas
<br/>
Massachusetts
<br/>
U S A
<br/>
MA 01516-113
</address>
<p>
<a href="http://www.agencyint.com">
www.agencyint.com
</a>
</p>
</div>
<span data-toggle=".info-cov-0">
Additional trading information
<i class="icon plus">
</i>
</span>
<ul class="result-info info-cov-0 cc">
<li>
<strong>
Accepts Business From:
</strong>
<ul class="cc">
<li>
U.S.A
</li>
</ul>
</li>
<li>
<strong>
Classes of business
</strong>
<ul class="cc">
<li>
Engineering
</li>
<li>
NM General Liability (US direct)
</li>
<li>
Property D&F (US binder)
</li>
<li>
Terrorism
</li>
</ul>
</li>
<li>
<strong>
Disclaimer:
</strong>
<p>
Please note that while coverholders may have been approved by Lloyd's to accept business from the regions shown:
</p>
<p>
it is the responsibility of the parties, including the coverholder and any Lloyd's managing agent appointing them to ensure that the coverholder complies with all local regulatory and legal requirements; and
</p>
<p>
the coverholder may not provide cover for all classes they are approved to underwrite in all territories where they have approval.
</p>
</li>
</ul>
</li>
<li>
<div class="contact-details">
<h2>
ABCO Insurance Underwriters Inc
</h2>
<br/>
<address>
ABCO Building, 350 Sevilla Avenue, Suite 201
<br/>
Coral Gables
<br/>
Florida
<br/>
U S A
<br/>
33134
</address>
<p>
<a href="http://www.abcoins.com">
www.abcoins.com
</a>
</p>
</div>
<span data-toggle=".info-cov-1">
Additional trading information
<i class="icon plus">
</i>
</span>
<ul class="result-info info-cov-1 cc">
<li>
<strong>
Accepts Business From:
</strong>
<ul class="cc">
<li>
U.S.A
</li>
</ul>
</li>
<li>
<strong>
Classes of business
</strong>
<ul class="cc">
<li>
Property D&F (US binder)
</li>
<li>
Terrorism
</li>
</ul>
</li>
<li>
<strong>
Disclaimer:
</strong>
<p>
Please note that while coverholders may have been approved by Lloyd's to accept business from the regions shown:
</p>
<p>
it is the responsibility of the parties, including the coverholder and any Lloyd's managing agent appointing them to ensure that the coverholder complies with all local regulatory and legal requirements; and
</p>
<p>
the coverholder may not provide cover for all classes they are approved to underwrite in all territories where they have approval.
</p>
</li>
</ul>
</li>
</ul>
</div>
我从这个HTML抓取多个数据点。给我带来麻烦的是&#34;接受业务来自:&#34;和&#34;业务类别&#34;值。我可以接受&#34;接受业务来自:&#34;值,无论它出现在哪个顺序:
try:
li_area = company.find('ul', class_='result-info info-cov-' +
str(company_counter) + ' cc')
li_stuff = li_area.find_all('li')
for li in li_stuff:
if li.strong.text.strip() == 'Accepts Business From:':
business_final = li.find('li').text.strip()
except AttributeError:
pass
注意:&#34;公司&#34;变量是包含我上面粘贴的html的beautifulsoup对象。
注意:页面上每个记录的类名都会更改 - 我只在HTML示例中包含了一条记录,以保持一些简洁。
当我尝试相同的代码块时,这次用'Accepts Business From:'
替换li.strong.text.strip()== 'Classes of business'
,但代码似乎没有检测到强标记,只是接受来自:&#39;的业务。我的for循环是否不正确,并且实际上没有迭代每个包含这些不同强标签的<li>
标签?难道这个强大的标签的真正价值与“业务类别”不同吗?&#39; (我确实直接从网站的HTML中复制了这个值。)
您可以提供的任何帮助都非常感谢
答案 0 :(得分:1)
您获取'Accepts Business From:'
而不是'Classes of business'
的文字的原因是您在错误的地方使用try-except
。
在for li in li_stuff:
循环的第二次迭代中,li
变为<li>U.S.A</li>
,因为没有AttributeError
,它会抛出li.strong
来调用<strong>
{1}}标签存在。并且,根据您当前的try-except
,错误会在for
循环外部发生,并且pass
为for li in li_stuff:
try:
if li.strong.text.strip() == 'Accepts Business From:':
business_final = li.find('li').text.strip()
print('Accepts Business From:', business_final)
if li.strong.text.strip() == 'Classes of business':
business_final = li.find('li').text.strip()
print('Classes of business:', business_final)
except AttributeError:
pass # or you can use 'continue' too.
。因此,循环不会达到第三次迭代,它应该获取“业务类”的文本。
要在捕获到错误后继续循环,请使用:
Accepts Business From: U.S.A
Classes of business: Engineering
输出:
if li.strong.text.strip() == 'Classes of business':
business_final = ', '.join([x.text.strip() for x in li.find_all('li')])
print('Classes of business:', business_final)
但是,由于“业务类”存在许多值,您可以将代码更改为此以获取所有值:
Accepts Business From: U.S.A
Classes of business: Engineering, NM General Liability (US direct), Property D&F (US binder), Terrorism
输出:
class ResultRow extends PureComponent {
render() {
const Comp = Icon[this.props.name];
return (
<div className="component-result-row">
<Comp />
</div>
);
}}