我要从以下页面抓取匹配结果:https://www.tennisexplorer.com/player/paire-4a33b/
从抓取的结果中,我想创建带有列的表:锦标赛,日期,match_player_1,match_player_2,回合,得分 我创建了一个代码,它可以正常工作,但是我不知道如何在每个匹配行中增加竞争性
import requests
from bs4 import BeautifulSoup
u = 'https://www.tennisexplorer.com/player/paire-4a33b/'
r = requests.get(u, timeout=120, headers=headers)
# print(r.status_code)
soup = BeautifulSoup(r.content, 'html.parser')
for tr in soup.select('#matches-2020-1-data tr'):
match_date = tr.select_one('td:nth-of-type(1)').get_text(strip=True)
match_surface = tr.select_one('td:nth-of-type(2)').get_text(strip=True)
match = tr.select_one('td:nth-of-type(3)').get_text(strip=True)
#...
我需要创建这样的表:
tournament date match_player_1 match_player_2 round score
Cincinnati Masters (New York) 22.08. Coric B. Paire B. 1R 6-0, 1-0
Ultimate Tennis Showdown 2 01.08. Moutet C. Paire B. NaN 15-0, 15-0, 15-0, 15-0
我如何将比赛与每次比赛相关联
答案 0 :(得分:3)
要获取所需的DataFrame,您可以执行以下操作:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.tennisexplorer.com/player/paire-4a33b/'
soup = BeautifulSoup( requests.get(url).content, 'html.parser' )
all_data = []
for row in soup.select('#matches-2020-1-data tr:not(:has(th))'):
tds = [td.get_text(strip=True, separator=' ') for td in row.select('td')]
all_data.append({
'tournament': row.find_previous('tr', class_='head flags').find('td').get_text(strip=True),
'date': tds[0],
'match_player_1': tds[2].split('-')[0].strip(),
'match_player_2': tds[2].split('-')[-1].strip(),
'round': tds[3],
'score': tds[4]
})
df = pd.DataFrame(all_data)
df.to_csv('data.csv')
保存data.csv
(来自LibreOffice的屏幕截图):
答案 1 :(得分:1)
尝试一下:
import pandas as pd
url = "https://www.tennisexplorer.com/player/paire-4a33b/"
df = pd.read_html(url)[8]
new_data = {"tournament":[], "date":[], "match_player_1":[], "match_player_2":[],
"round":[], "score":[]}
for index, row in df.iterrows():
try:
date = float(row.iloc[0][:-1])
new_data["tournament"].append(tourn)
new_data["date"].append(row.iloc[0])
new_data["match_player_1"].append(row.iloc[2].split("-")[0])
new_data["match_player_2"].append(row.iloc[2].split("-")[1])
new_data["round"].append(row.iloc[3])
new_data["score"].append(row.iloc[4])
except Exception as e:
tourn = row.iloc[0]
data = pd.DataFrame(new_data)