卷材刮刮问题(威廉希尔-UFC赔率)

时间:2019-08-12 17:03:47

标签: python web-scraping beautifulsoup

我正在创建一个网络抓取工具,这将使我对威廉希尔(William Hill)即将举行的UFC搏击事件感到惊讶。我正在使用漂亮的汤,但尚未能够成功抓取所需的数据。 (https://sports.williamhill.com/betting/en-gb/ufc

我需要战士的名字和赔率。

我尝试了多种方法来尝试获取数据,尝试刮擦不同的标签等,但是什么都没有发生。

def scrape_data():
    data = requests.get("https://sports.williamhill.com/betting/en- 
gb/ufc")
    soup = BeautifulSoup(data.text, 'html.parser')
    links = soup.find_all('a',{'class': 'btmarket__name btmarket__name-- 
featured'}, href=True)

        for link in links:

        links.append(link.get('href'))

        for link in links:
        print(f"Now currently scraping link: {link}")

        data = requests.get(link)
        soup = BeautifulSoup(data.text, 'html.parser')
        time.sleep(1)            

        fighters = soup.find_all('p', {'class': "btmarket__name"})
        c = fighters[0].text.strip()
        d = fighters[1].text.strip()

        f1.append(c)
        f2.append(d)

        odds = soup.find_all('span', {'class': "betbutton_odds"})

        a = odds[0].text.strip()
        b = odds[1].text.strip()

        f1_odds.append(a)
        f2_odds.append(b)

    return None

我希望它可以导出到CSV文件。我目前正在使用Morph.io来托管和运行刮板,但是它什么也没返回。

如果正确,它将输出:

  1. Fighter1Name:
  2. Fighter2Name:
  3. F1赔率:
  4. F2赔率:

每一次打架。
任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

返回的html具有不同的属性和值。您需要检查响应。

要写出csv,您需要在赔率前添加“'”,以防止赔率被视为小数或日期。请参见下面的代码中注释掉的替代方案。

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

r = requests.get('https://sports.williamhill.com/betting/en-gb/ufc')
soup = bs(r.content, 'lxml')
results = []

for item in soup.select('.btmarket:has([data-odds])'):
    match_name = item.select_one('.btmarket__name[title]')['title']
    odds = [i['data-odds'] for i in item.select('[data-odds]')]
    row = {'event-starttime' : item.select_one('[datetime]')['datetime']        
     ,'match_name' : match_name 
     ,'home_name' : match_name.split(' vs ')[0]
     #,'home_odds' : "'" + str(odds[0])
     ,'home_odds' : odds[0]
     ,'away_name' : match_name.split(' vs ')[1]
     ,'away_odds' :  odds[1]
     #,'away_odds' : "'" + str(odds[1])
   }
    results.append(row)

df = pd.DataFrame(results, columns = ['event-starttime','match_name','home_name','home_odds','away_name','away_odds'])
print(df.head())
#write to csv
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False )