我正在尝试抓取所有公司的名称listed on this site。每页(共14页)显示80家公司的名称。每个URL的末尾都有 start = 241&count = 80&first = 2009&last = 2018 ,其中start是页面的第一行。我正在尝试遍历每80家公司,这些公司将遍历每个页面,并刮擦公司名称。但是,每次尝试时,我都会在循环中第二次收到此错误:
File "beautiful_soup_2.py", line 10, in <module>
name_table = (soup.findAll('table')[4])
File "C:\Users\adamm\Downloads\Python\lib\site-packages\bs4\element.py", line 1807, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'findAll'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
但是,如果我删除列表并手动输入 start = 81、161、241 等的URL,结果将返回页面上的公司列表。
到目前为止,我的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
for x in range(1,1042,80):
sauce = ('https://www.sec.gov/cgi-bin/srch-edgar?text=form-type%20%3D%2010-12b%20OR%20form-type%3D10-12b%2Fa&start={}&count=80&first=2009&last=2018'.format(x))
source_link = urlopen(sauce).read()
soup = soup(source_link, 'lxml')
name_table = (soup.findAll('table')[4])
table_rows = name_table.findAll('tr')
for row in table_rows:
cols = row.findAll('td')
cols = [x.text.strip() for x in cols]
print(cols)
这让我发疯,因此,非常感谢您的帮助。