我想打印诸如https://www.epa.gov/enforcement/chevron-settlement-information-sheet或https://www.epa.gov/enforcement/ngl-crude-logistics-llc-clean-air-act-settlement之类的EPA和解协议的“民事处罚”部分
出于以下HTML来源
<h2 id="civil">Civil Penalty</h2>
<p>Chevron U.S.A. will pay a $2.95 million civil penalty, of which $2,492,750 will be paid to the United States and $457,250 to the State of Mississippi.</p>
我想得到 Chevron U.S.A.将支付295万美元的民事罚款...
所有结算情况说明书的结构相同。
<h2 id="civil">Civil Penalty</h2>
<p>NGL will pay a civil penalty of $25 million. The penalty is based, in part, on the company’s limited ability to pay a larger penalty.</p>
我发现与Get an element before a string with Beautiful Soup相似-但这与我的问题并不完全相同。
这是我的代码框架:
import requests
from bs4 import BeautifulSoup
import sys
for i in ['chevron-settlement-information-sheet', 'ngl-crude-logistics-llc-clean-air-act-settlement', 'derive-systems-clean-air-act-settlement']:
page = requests.get("https://www.epa.gov/enforcement/"+i)
soup = BeautifulSoup(page.content, 'html.parser')
data = []
for result in soup.find_all('h2', id='civil'):
data.append(result)
print(data)
如何在<p>
之后直接打印<h2 id="civil">
部分?
答案 0 :(得分:1)
您可以尝试使用兄弟选择器+
。
p=soup.select('#civil + p')
print(p[0].getText())
这将仅选择p
元素的下一个兄弟元素#civil
。
答案 1 :(得分:0)
您可能未获得想要的结果的一个原因是您在URL中添加了/history
,从而导致了404 error page。如果删除该部分,然后使用findNext('p')
在页面上<h2 id="civil">
之后的下一个段落元素,则会得到预期的结果:
import requests
from bs4 import BeautifulSoup
for url in ['chevron-settlement-information-sheet', 'ngl-crude-logistics-llc-clean-air-act-settlement', 'derive-systems-clean-air-act-settlement']:
page = requests.get("https://www.epa.gov/enforcement/" + url)
soup = BeautifulSoup(page.content, 'html.parser')
result = soup.find('h2', {'id': 'civil'}).findNext('p')
print(result.text)
打印输出:
Chevron U.S.A. will pay a $2.95 million civil penalty, of which $2,492,750 will be paid to the United States and $457,250 to the State of Mississippi.
NGL will pay a civil penalty of $25 million. The penalty is based, in part, on the company’s limited ability to pay a larger penalty.
Derive will pay a civil penalty of $300,000, as the company has limited financial ability to pay a higher penalty.