Question

Python和网络抓取的新手。我正在尝试通过BeautifulSoup（来自ESPN）将大学的实时足球比赛成绩导入Panda DataFrame。我搜索过高低，似乎无法正确导入导入的分数。

将其放入数据框后，然后将结果导入Excel。

这是我到目前为止所拥有的。结果在所有团队的一列中，后面是所有得分。


from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd


driver = webdriver.Chrome(executable_path=r'C:\Users\Jims Maximus Hero\Desktop\chromedriver.exe')
driver.get("https://www.espn.com/college-football/scoreboard/_/group/80/year/2019/seasontype/2/week/11")

html = driver.page_source
soup = BeautifulSoup(html, "lxml")

for tag in soup.find_all("span", {"class":"sb-team-short"}):
    print (tag.text)

for tag in soup.find_all("td", {"class":"total"}):
    print (tag.text)

感谢您的帮助

Answer 1

尝试一下：

driver.get('https://www.espn.com/college-football/scoreboard/_/group/80/year/2019/seasontype/2/week/11')

df  = pd.read_html(driver.find_element_by_xpath('//*[@id="401119297"]/div/div/section/div/table').get_attribute('outerHTML'))
results = driver.find_elements_by_xpath("//article[contains(@class, 'scoreboard football')]")
df = pd.DataFrame()
for result in results:
    score  = pd.read_html(driver.find_element_by_xpath('//*[@id="'+str(result.get_attribute('id'))+'"]/div/div/section/div/table').get_attribute('outerHTML'))
    score = score[0].dropna(axis=0, thresh=4)
    df = pd.concat([df,score])
print(df)

收益：

熊猫DataFrame的足球得分数据

1 个答案: