首先,返回字符串中有一个前导“1”,我遇到麻烦迭代传递它 - 我尝试使用[0:]:方法并卡在某处。我想跳过它或跳过它来获得第二个id值。刮表
此外,在尝试格式化表中的返回项目以进行存储时 - 我一直在使索引超出范围错误。我一直在使用def store()。
import requests
from bs4 import BeautifulSoup
import MySQLdb
#mysql portion
mydb = MySQLdb.connect(host='****',
user= '****',
passwd='****',
db='****')
cur = mydb.cursor()
def store (id, ticker):
cur.execute('INSERT IGNORE INTO TEST (id, ticker) VALUES (\"%s\", \"%s\")',(id, ticker))
cur.connection.commit()
base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,24,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
main_div = soup.find('div', attrs = {'id':'screener-content'})
table = main_div.find('table')
sub = table.findAll('tr')
cells = sub[5].findAll('td')
for cell in cells:
link = cell.a
if link is not None:
link = link.get_text()
id = link[0]
ticker = link[1]
store(id, ticker)
print(link)
答案 0 :(得分:1)
我不知道你真正尝试做什么,但这对我有用
import requests
from bs4 import BeautifulSoup
base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,24,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
rows = soup.find_all('tr', class_=["table-dark-row-cp", "table-light-row-cp"])
for row in rows:
columns = row.find_all('td')
id_ = columns[0].a.get_text()
ticker = columns[1].a.get_text()
company = columns[2].a.get_text()
sector = columns[3].a.get_text()
industry = columns[4].a.get_text()
print(id_, ticker, company, sector, industry)
或a
for row in rows:
columns = row.find_all('a')
id_ = columns[0].get_text()
ticker = columns[1].get_text()
company = columns[2].get_text()
sector = columns[3].get_text()
industry = columns[4].get_text()
print(id_, ticker, company, sector, industry)
BTW:您还可以使用CSS
选择器
rows = soup.select('#screener-content table[bgcolor="#d3d3d3"] tr[class]')
或
rows = soup.select('#screener-content table[bgcolor="#d3d3d3"] tr')
# skip first row with headers
rows = rows[1:]