Python BeautifulSoup MySQL存储和交互

时间:2017-02-01 23:53:29

标签: python mysql web-scraping beautifulsoup

首先,返回字符串中有一个前导“1”,我遇到麻烦迭代传递它 - 我尝试使用[0:]:方法并卡在某处。我想跳过它或跳过它来获得第二个id值。刮表

此外,在尝试格式化表中的返回项目以进行存储时 - 我一直在使索引超出范围错误。我一直在使用def store()。

import requests
from bs4 import BeautifulSoup
import MySQLdb

#mysql portion
mydb = MySQLdb.connect(host='****',
   user= '****',
   passwd='****',
   db='****')
cur = mydb.cursor()
def store (id, ticker):
    cur.execute('INSERT IGNORE INTO TEST (id, ticker) VALUES (\"%s\", \"%s\")',(id, ticker))
    cur.connection.commit()

base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,24,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
main_div = soup.find('div', attrs = {'id':'screener-content'})
table = main_div.find('table')
sub = table.findAll('tr')
cells = sub[5].findAll('td')

for cell in cells:
    link = cell.a
    if link is not None:
    link = link.get_text()
        id = link[0]
        ticker = link[1]
        store(id, ticker)
    print(link)

1 个答案:

答案 0 :(得分:1)

我不知道你真正尝试做什么,但这对我有用

import requests
from bs4 import BeautifulSoup

base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,24,25,63,64,65,66,67'

html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")

rows = soup.find_all('tr', class_=["table-dark-row-cp", "table-light-row-cp"])

for row in rows:
    columns = row.find_all('td')

    id_ = columns[0].a.get_text()
    ticker = columns[1].a.get_text()
    company = columns[2].a.get_text()
    sector = columns[3].a.get_text()
    industry = columns[4].a.get_text()

    print(id_, ticker, company, sector, industry)

a

的活动
for row in rows:
    columns = row.find_all('a')

    id_ = columns[0].get_text()
    ticker = columns[1].get_text()
    company = columns[2].get_text()
    sector = columns[3].get_text()
    industry = columns[4].get_text()

    print(id_, ticker, company, sector, industry)

BTW:您还可以使用CSS选择器

rows = soup.select('#screener-content table[bgcolor="#d3d3d3"] tr[class]')

rows = soup.select('#screener-content table[bgcolor="#d3d3d3"] tr')
# skip first row with headers
rows = rows[1:]