需要使用xpath和beautifulsoup从网站上抓取数据

时间:2020-10-21 14:36:50

标签: python xpath web-scraping beautifulsoup

大家好

website link

故事是,他试图抓取一个名为“ Open Bets”的表,但不幸的是,该表没有类或ID,我使用beautifulsoup来抓取该表,并且我使用了XPath来检测该表,但是如您所见在下面的图片中:

enter image description here

我尝试从表中抓取数据,并检测名为“ Team A”和“ Team B”的列 关键是我显示了这样的数据

print(Player1," vs ",Player2)
print("Odds ",odds)
print("Rate ",rate)
print("stake ",stake)

我想您会明白我在这里想做什么 这是下表: enter image description here

我试图与网站管理员联系,以向代码源添加类或其他内容,但是什么也没有。

from lxml import html
import requests
page = requests.get('https://tipsters.asianbookie.com/index.cfm?player=Mitya68&ID=297754')
tree = html.fromstring(page.content)
ID = tree.xpath('/html/body/table[2]/tbody/tr/td[3]/table[7]')
print(ID)

这是我使用的代码,如果有人可以提供帮助,那将非常有用=)

1 个答案:

答案 0 :(得分:1)

一种简单的方法是使用pandas。这是您的操作方式:

import pandas as pd
import requests

r = requests.get('https://tipsters.asianbookie.com/index.cfm?player=Mitya68&ID=297754&sortleague=1#playersopenbets&tz=5.5').text

dfs = pd.read_html(r)

df = dfs[141]

df.columns = df.iloc[0]

df = df.drop(0)

df['Bet Placed ≡'] = [value.split('.')[-1] for value in df['Bet Placed ≡']]

print(df)

输出:

0   Bet Placed ≡              Team A  ...   Rate         Pending Status
1    9 hours ago         Real Madrid  ...  1.975            pending ?-?
2    9 hours ago   Red Bull Salzburg  ...  1.875            pending ?-?
3    9 hours ago                Ajax  ...   2.00            pending ?-?
4    9 hours ago       Bayern Munich  ...   2.00            pending ?-?
5    9 hours ago       Bayern Munich  ...   1.85            pending ?-?
6    9 hours ago         Inter Milan  ...  1.875            pending ?-?
7    9 hours ago     Manchester City  ...   1.95            pending ?-?
8    9 hours ago         Midtjylland  ...  1.875            pending ?-?
9    9 hours ago  Olympiakos Piraeus  ...   1.95            pending ?-?
10   9 hours ago          Hamburg SV  ...  1.925            pending ?-?
11   9 hours ago         Vissel Kobe  ...  1.925   Lost(-25,000) FT 1-3
12   9 hours ago     Shonan Bellmare  ...  1.825   Won½(+10,313) FT 0-0
13   9 hours ago    Yokohama Marinos  ...  2.025   Won½(+12,812) FT 2-1
14   9 hours ago        RKC Waalwijk  ...  1.875            pending ?-?
15   9 hours ago            Espanyol  ...  2.075  lose(-25,000) 29' 1-0

[15 rows x 7 columns]

您还可以通过将以下行添加到代码中来将这些值作为单独的列表获得:

team_a = list(df['Team A'])
team_b = list(df['Team B'])
rate = list(df['Rate'])
stake = list(df['Stake'])

如果要以提到的格式打印它们,请将以下行添加到代码中:

final_lst = zip(team_a,team_b,stake,rate)

for teamA,teamB,stakee,ratee in final_lst:
    print(f"{teamA} vs {teamB} - Stake: {stakee}, Rate: {ratee}")

输出:

Real Madrid vs Shaktar Donetsk - Stake: 25000.00, Rate: 1.975
Red Bull Salzburg vs Lokomotiv Moscow - Stake: 100000.00, Rate: 1.875
Ajax vs Liverpool - Stake: 25000.00, Rate: 2.00
Bayern Munich vs Atl. Madrid - Stake: 25000.00, Rate: 2.00
Bayern Munich vs Atl. Madrid - Stake: 25000.00, Rate: 1.85
Inter Milan vs Monchengladbach - Stake: 25000.00, Rate: 1.875
Manchester City vs Porto - Stake: 25000.00, Rate: 1.95
Midtjylland vs Atalanta - Stake: 100000.00, Rate: 1.875
Olympiakos Piraeus vs Marseille - Stake: 25000.00, Rate: 1.95
Hamburg SV vs Erzgebirge Aue - Stake: 100000.00, Rate: 1.925
Vissel Kobe vs Kashima Antlers - Stake: 25000.00, Rate: 1.925
Shonan Bellmare vs Sagan Tosu - Stake: 25000.00, Rate: 1.825
Yokohama Marinos vs Nagoya - Stake: 25000.00, Rate: 2.025
RKC Waalwijk vs PEC Zwolle - Stake: 25000.00, Rate: 1.875
Espanyol vs Mirandes - Stake: 25000.00, Rate: 2.075