我想从此站点上的多个匹配项中检索特定统计信息(PPDA):
https // understat.com / match / xxxx
我创建了以下代码来解析HTML并使用Python遍历每个匹配项,但是我在努力提取特定的统计信息并将其加载到csv和图形中。我是初学者,任何帮助将不胜感激!
代码:
import pandas as pd
import re
import random
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import datetime
import csv
for i in range(9577,9807):
ppda_url = 'https://understat.com/match/' + str(i)
ppda_data = requests.get(ppda_url)
ppda_html = ppda_data.content
xml
soup = BeautifulSoup(ppda_html, 'lxml')
options=webdriver.ChromeOptions()
driver = webdriver.Chrome(chrome_options=options)
driver.get(ppda_url)
soup = BeautifulSoup(driver.page_source, 'lxml')
答案 0 :(得分:0)
要使用BeautifulSoup提取数据并将其写入CSV文件,请首先找到带有PPDA文本的div元素。然后找到具有进度值类的下一个div元素,然后具有进度值类的下一个div元素,并从最后两个div中获取数据。像这样将其写入csv文件。
import requests
from bs4 import BeautifulSoup
import csv
with open('ppda.csv', 'w', newline='') as csvfile:
for i in range(9577,9807):
ppda_url = 'https://understat.com/match/' + str(i)
ppda_data = requests.get(ppda_url)
ppda_html = ppda_data.content
soup = BeautifulSoup(ppda_html, 'lxml')
ppda = soup.find("div", string='PPDA')
home = ppda.findNext('div', {'class':"progress-value"})
print (home.text, home.findNext('div', {'class':"progress-value"}).text)
writer = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow([home.text, home.findNext('div', {'class':"progress-value"}).text])
要绘制图表,请先从matplotlib开始。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(columns=['HOME', 'AWAY'])
for i in range(9577,9807):
ppda_url = 'https://understat.com/match/' + str(i)
ppda_data = requests.get(ppda_url)
ppda_html = ppda_data.content
soup = BeautifulSoup(ppda_html, 'lxml')
ppda = soup.find("div", string='PPDA')
home = ppda.findNext('div', {'class':"progress-value"})
print (home.text, home.findNext('div', {'class':"progress-value"}).text)
df = df.append({'HOME': float(home.text), 'AWAY' : float(home.findNext('div', {'class':"progress-value"}).text)}, ignore_index=True)
#print (df)
df.to_csv("ppda2.csv", encoding='utf-8', index=False)
df.plot.bar()
plt.show()
输出:CSV文件和图形