使用Python和Beautiful Soup从数据站点刮取表格

时间:2018-10-11 20:27:34

标签: python beautifulsoup

我是Python的初学者,我坚持从https://wow-pets.com/compare/eu/silvermoon/kazzak抓取整个表格的想法 所以我从这里开始:

import urllib
import urllib.request
from bs4 import BeautifulSoup
from time import sleep

WAIT_PERIOD = 20
def make_soup(url):
   thepage1=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'}) 
   thepage = urllib.request.urlopen(thepage1).read()
   sleep(WAIT_PERIOD)
   soupdata = BeautifulSoup(thepage, "html.parser")
   return soupdata

petdata=""
soup = make_soup("https://wow-pets.com/compare/eu/draenor/silvermoon")

在我尝试过之后,我再也无法用小名,价格等来吸引大家了。 我的主要目标是计算最佳比率并打印出最佳结果。

感谢您的帮助!! :)

1 个答案:

答案 0 :(得分:2)

request.get调用仅产生空的标题标签之后,该站点似乎使用脚本来更新表列表,以检查表的结构。要解决此问题,请使用浏览器操作工具,例如selenium

from bs4 import BeautifulSoup as soup
from selenium import webdriver
import re
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://wow-pets.com/compare/eu/silvermoon/kazzak')
page = soup(d.page_source, 'html.parser').find('table', {'class':'table-sortable'})
headers = [i.text for i in page.find('thead').find_all('th')]
main_table = [[c.text for c in i.find_all('td')] for i in page.find('tbody').find_all('tr')]
final_results = [dict(zip(headers, [re.sub('\n+', '', a), *b])) for a, *b in main_table]

输出(前十个结果):

[{'Pet name': 'Hippogryph Hatchling', 'Silvermoon': '499,999', 'Kazzak': '313,949', 'Diff.': '▼ 37%', 'Global price': '668,709'}, {'Pet name': 'Spectral Tiger Cub', 'Silvermoon': '492,711', 'Kazzak': '400,000', 'Diff.': '▼ 19%', 'Global price': '876,368'}, {'Pet name': 'Nightsaber Cub', 'Silvermoon': '304,836', 'Kazzak': '250,000', 'Diff.': '▼ 18%', 'Global price': '671,397'}, {'Pet name': 'Everliving Spore', 'Silvermoon': '301,000', 'Kazzak': '439,993', 'Diff.': '▲ 46%', 'Global price': '691,879'}, {'Pet name': 'Dragon Kite', 'Silvermoon': '297,234', 'Kazzak': '359,987', 'Diff.': '▲ 21%', 'Global price': '628,084'}, {'Pet name': 'Rocket Chicken', 'Silvermoon': '284,053', 'Kazzak': '309,999', 'Diff.': '▲ 9%', 'Global price': '651,913'}, {'Pet name': 'Tuskarr Kite', 'Silvermoon': '278,595', 'Kazzak': '299,998', 'Diff.': '▲ 8%', 'Global price': '635,809'}, {'Pet name': 'Guardian Cub', 'Silvermoon': '267,741', 'Kazzak': '299,999', 'Diff.': '▲ 12%', 'Global price': '716,485'}, {'Pet name': "Landro's Lichling", 'Silvermoon': '247,999', 'Kazzak': '200,000', 'Diff.': '▼ 19%', 'Global price': '565,617'}, {'Pet name': 'Bananas', 'Silvermoon': '239,431', 'Kazzak': '278,711', 'Diff.': '▲ 16%', 'Global price': '540,228'}]