Question

我需要的一切都在此网页上：https://www.basketball-reference.com/teams/BOS/2019.html

我想编写一些内容以逐步排进名册表中的名称，然后在页面中的其他特定表（例如总计）中搜索该播放器名称，然后将新数据添加到该播放器的行末名称。

到目前为止，这就是我所得到的，只是获取名册信息的代码。任何方向或技巧都将不胜感激。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from datetime import date, datetime, timedelta
from pandas import DataFrame
import csv
import calendar
import pandas as pd
import os


season = str(date.today().year )
month = calendar.month_name[date.today().month].lower()
teamUrl = "https://basketball-reference.com/teams/"

teamRoster =    {'BOS': teamUrl + 'BOS/' + season +'.html'} 

driver = webdriver.Chrome()

for url in teamRoster.values():
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    #teamName = soup.find(class_="teams*").find_all('span')[1]
    for i in soup.find_all('table', id='roster'):
        for row1 in i.select('tr')[1:]:
            listA = ([td.text for td in row1.select("td")])
            print(listA)

来自多个表的数据基于公共ID

0 个答案: