创建日期列表并在URL中插入

时间:2018-03-09 00:15:10

标签: python python-3.x beautifulsoup

我对Python仍然比较新,所以请耐心等待,但这是我的问题。我有一个特定的日期列表,需要插入到URL中,然后遍历每个URL以刮取Web数据。我已经完成了类似的任务,但在我不需要创建列表的情况下。下面是一个例子。

url_template = "https://www.basketball-reference.com/play-
index/lineup_finder.cgi?request=1&match=single&player_id=&offset={set}"

lineup_df = pd.DataFrame()

for set in range(0, 12600, 100):  # for each page
    url = url_template.format(set=set)  # get the url

page_request = requests.get(url)
soup = BeautifulSoup(page_request.text,"lxml")

column_headers = [th.getText() for th in 
        soup.findAll('tr', limit=2)[1].findAll('th')]

# get lineup data
data_rows = soup.findAll('tr')[2:] 
lineup_data = [[td.getText() for td in data_rows[i].findAll(['td','th'])]
        for i in range(len(data_rows))]

# Turn page data into a DataFrame
page_df = pd.DataFrame(lineup_data, columns=column_headers)

# Append to the big dataframe
lineup_df = lineup_df.append(page_df, ignore_index=True)

所以基本上我想要完成的是用日期列表替换范围内的集合。希望这是有道理的。

1 个答案:

答案 0 :(得分:1)

您的代码会在创建网址的过程中运行,但它不会在列表中捕获它,列表理解将完成这项工作。然后,您可以为每个url created.off

运行url_list
url_template = "https://www.basketball-reference.com/playindex/lineup_finder.cgi?request=1&match=single&player_id=&offset={offset}"
url_list=[url_template.format(offset=offset) for offset in range(0, 12600, 100)]
for url in url_list:
    # the rest of code here