我是bs4的新手,我期待提取价格表。
我面临的主要问题是,在html页面中,table元素不是这样显示的,而是div
。
我试图通过class
,id
来查看,但我无法获得价格。
这就是我的尝试:
url = "http://www.valoreazioni.com/indici/ftse-mib_ftsemib_mi"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,"html5lib")
以下是我为了获得价格表而应用的过滤器 不成功
# table=soup.find('div',{'id':'maidMoneyTable'})
# table=soup.find(id='maidMoneyTable')
route=pd.read_html(str(tables),flavor='html5lib')
print(route)
在这两种情况下,返回值为no tables were found
有谁能告诉我如何获得所需的桌子?
答案 0 :(得分:0)
使用BeautifulSoup从页面中截取数据,暂时将其保存在sqlite3表中,然后使用pandas处理sql将sqlite3中的数据转换为pandas。
>>> import requests
>>> page = requests.get('http://www.valoreazioni.com/indici/ftse-mib_ftsemib_mi').content
>>> import bs4
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> maidMoneyTable = soup.find_all(id='maidMoneyTable')
>>> table_rows = maidMoneyTable.findAll('li', attrs={'class': 'order'})
>>> for row in table_rows:
... link = row.find('a')
... data = [link.attrs['href']] + [_.text for _ in link.findAll('li')]
... result = c.execute('''INSERT INTO market VALUES (?,?,?,?,?,?,?)''', data)
...
>>> df = pd.read_sql_query('SELECT * FROM market', conn)
>>> df.head()
url symbol \
0 http://www.valoreazioni.com/titoli/a2a-a2a-mi A2A.MI
1 http://www.valoreazioni.com/titoli/anima-holdi... ANIM.MI
2 http://www.valoreazioni.com/titoli/atlantia-at... ATL.MI
3 http://www.valoreazioni.com/titoli/azimut-hold... AZM.MI
4 http://www.valoreazioni.com/titoli/banca-medio... BMED.MI
name item_1 item_2 item_3 item_4
0 A2A SpA 1.50 1.503 0.003 +0.200%
1 ANIMA HOLDING SPA 6.26 6.210 -0.040 -0.64%
2 ATLANTIA 25.96 25.640 -0.240 -0.93%
3 AZIMUT HOLDING 17.94 17.930 0.060 +0.34%
4 BANCA MEDIOLANUM 7.43 7.290 -0.150 -2.02%