Python和网络抓取的新手。我正在尝试通过BeautifulSoup(来自ESPN)将大学的实时足球比赛成绩导入Panda DataFrame。我搜索过高低,似乎无法正确导入导入的分数。
将其放入数据框后,然后将结果导入Excel。
这是我到目前为止所拥有的。结果在所有团队的一列中,后面是所有得分。
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(executable_path=r'C:\Users\Jims Maximus Hero\Desktop\chromedriver.exe')
driver.get("https://www.espn.com/college-football/scoreboard/_/group/80/year/2019/seasontype/2/week/11")
html = driver.page_source
soup = BeautifulSoup(html, "lxml")
for tag in soup.find_all("span", {"class":"sb-team-short"}):
print (tag.text)
for tag in soup.find_all("td", {"class":"total"}):
print (tag.text)
感谢您的帮助
答案 0 :(得分:0)
尝试一下:
driver.get('https://www.espn.com/college-football/scoreboard/_/group/80/year/2019/seasontype/2/week/11')
df = pd.read_html(driver.find_element_by_xpath('//*[@id="401119297"]/div/div/section/div/table').get_attribute('outerHTML'))
results = driver.find_elements_by_xpath("//article[contains(@class, 'scoreboard football')]")
df = pd.DataFrame()
for result in results:
score = pd.read_html(driver.find_element_by_xpath('//*[@id="'+str(result.get_attribute('id'))+'"]/div/div/section/div/table').get_attribute('outerHTML'))
score = score[0].dropna(axis=0, thresh=4)
df = pd.concat([df,score])
print(df)