我对Python仍然比较新,所以请耐心等待,但这是我的问题。我有一个特定的日期列表,需要插入到URL中,然后遍历每个URL以刮取Web数据。我已经完成了类似的任务,但在我不需要创建列表的情况下。下面是一个例子。
url_template = "https://www.basketball-reference.com/play-
index/lineup_finder.cgi?request=1&match=single&player_id=&offset={set}"
lineup_df = pd.DataFrame()
for set in range(0, 12600, 100): # for each page
url = url_template.format(set=set) # get the url
page_request = requests.get(url)
soup = BeautifulSoup(page_request.text,"lxml")
column_headers = [th.getText() for th in
soup.findAll('tr', limit=2)[1].findAll('th')]
# get lineup data
data_rows = soup.findAll('tr')[2:]
lineup_data = [[td.getText() for td in data_rows[i].findAll(['td','th'])]
for i in range(len(data_rows))]
# Turn page data into a DataFrame
page_df = pd.DataFrame(lineup_data, columns=column_headers)
# Append to the big dataframe
lineup_df = lineup_df.append(page_df, ignore_index=True)
所以基本上我想要完成的是用日期列表替换范围内的集合。希望这是有道理的。
答案 0 :(得分:1)
您的代码会在创建网址的过程中运行,但它不会在列表中捕获它,列表理解将完成这项工作。然后,您可以为每个url created.off
运行url_listurl_template = "https://www.basketball-reference.com/playindex/lineup_finder.cgi?request=1&match=single&player_id=&offset={offset}"
url_list=[url_template.format(offset=offset) for offset in range(0, 12600, 100)]
for url in url_list:
# the rest of code here