嗨,我正在使用BS4抓取sic代码和说明。目前,我有以下代码可以完全满足我的需要,但是我不知道如何在inspect元素视图和视图源中抓取下面的描述图片。
需要明确的是“国有商业银行”和“实验室分析仪器”
https://www.sec.gov/cgi-bin/browse-edgar?CIK=866054&owner=exclude&action=getcompany&Find=Search
<div class="companyInfo">
<span class="companyName">COMMERCIAL NATIONAL FINANCIAL CORP /PA <acronym title="Central Index Key">CIK</acronym>#: <a href="/cgi-bin/browse-edgar?action=getcompany&CIK=0000866054&owner=exclude&count=40">0000866054 (see all company filings)</a></span>
<p class="identInfo"><acronym title="Standard Industrial Code">SIC</acronym>: <a href="/cgi-bin/browse-edgar?action=getcompany&SIC=6022&owner=exclude&count=40">6022</a> - STATE COMMERCIAL BANKS<br />State location: <a href="/cgi-bin/browse-edgar?action=getcompany&State=PA&owner=exclude&count=40">PA</a> | State of Inc.: <strong>PA</strong> | Fiscal Year End: 1231<br />(Office of Finance)<br />Get <a href="/cgi-bin/own-disp?action=getissuer&CIK=0000866054"><b>insider transactions</b></a> for this <b>issuer</b>.
for cik_num in cik_num_list:
try:
url = r"https://www.sec.gov/cgi-bin/browse-edgar?CIK={}&owner=exclude&action=getcompany".format(cik_num)
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
try:
comp_name = soup.find_all('div', {'class':'companyInfo'})[0].find('span').text
sic_code = soup.find_all('p', {'class':'identInfo'})[0].find('a').text
答案 0 :(得分:1)
import requests
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/cgi-bin/browse-edgar?CIK=866054&owner=exclude&action=getcompany&Find=Search'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
sic_code_desc = soup.select_one('.identInfo').a.find_next_sibling(text=True).split(maxsplit=1)[-1]
print(sic_code_desc)
打印:
STATE COMMERCIAL BANKS
对于url = 'https://www.sec.gov/cgi-bin/browse-edgar?CIK=1090872&owner=exclude&action=getcompany&Find=Search'
,它会打印:
LABORATORY ANALYTICAL INSTRUMENTS