我正在创建一个网络抓取工具,这将使我对威廉希尔(William Hill)即将举行的UFC搏击事件感到惊讶。我正在使用漂亮的汤,但尚未能够成功抓取所需的数据。 (https://sports.williamhill.com/betting/en-gb/ufc)
我需要战士的名字和赔率。
我尝试了多种方法来尝试获取数据,尝试刮擦不同的标签等,但是什么都没有发生。
def scrape_data():
data = requests.get("https://sports.williamhill.com/betting/en-
gb/ufc")
soup = BeautifulSoup(data.text, 'html.parser')
links = soup.find_all('a',{'class': 'btmarket__name btmarket__name--
featured'}, href=True)
for link in links:
links.append(link.get('href'))
for link in links:
print(f"Now currently scraping link: {link}")
data = requests.get(link)
soup = BeautifulSoup(data.text, 'html.parser')
time.sleep(1)
fighters = soup.find_all('p', {'class': "btmarket__name"})
c = fighters[0].text.strip()
d = fighters[1].text.strip()
f1.append(c)
f2.append(d)
odds = soup.find_all('span', {'class': "betbutton_odds"})
a = odds[0].text.strip()
b = odds[1].text.strip()
f1_odds.append(a)
f2_odds.append(b)
return None
我希望它可以导出到CSV文件。我目前正在使用Morph.io
来托管和运行刮板,但是它什么也没返回。
如果正确,它将输出:
每一次打架。
任何帮助将不胜感激。
答案 0 :(得分:0)
返回的html具有不同的属性和值。您需要检查响应。
要写出csv,您需要在赔率前添加“'”,以防止赔率被视为小数或日期。请参见下面的代码中注释掉的替代方案。
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
r = requests.get('https://sports.williamhill.com/betting/en-gb/ufc')
soup = bs(r.content, 'lxml')
results = []
for item in soup.select('.btmarket:has([data-odds])'):
match_name = item.select_one('.btmarket__name[title]')['title']
odds = [i['data-odds'] for i in item.select('[data-odds]')]
row = {'event-starttime' : item.select_one('[datetime]')['datetime']
,'match_name' : match_name
,'home_name' : match_name.split(' vs ')[0]
#,'home_odds' : "'" + str(odds[0])
,'home_odds' : odds[0]
,'away_name' : match_name.split(' vs ')[1]
,'away_odds' : odds[1]
#,'away_odds' : "'" + str(odds[1])
}
results.append(row)
df = pd.DataFrame(results, columns = ['event-starttime','match_name','home_name','home_odds','away_name','away_odds'])
print(df.head())
#write to csv
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False )