base_url = 'http://www.baseball-reference.com' # base url for concatenation
data = requests.get("http://www.baseball-reference.com/teams/BAL/2014-schedule-scores.shtml")
soup = BeautifulSoup(data.content)
url = []
for link in soup.find_all('a'):
if not link.has_attr('href'):
continue
if link.get_text() != 'boxscore':
continue
url = base_url + link['href']
打印网址时,这就是我得到的。
http://www.baseball-reference.com/boxes/BAL/BAL201403310.shtml
http://www.baseball-reference.com/boxes/BAL/BAL201404020.shtml
http://www.baseball-reference.com/boxes/BAL/BAL201404030.shtml
http://www.baseball-reference.com/boxes/DET/DET201404040.shtml ......
http://www.baseball-reference.com/boxes/KCA/KCA201410150.shtml
如果我遍历这个,它给了我单个元素,我理解为什么,但我需要制作一个向量,每个元素是每个盒子分数的完整URL。做这个的最好方式是什么?我应该追加前62个元素,然后是62个,依此类推。我不确定最好的方法是什么。
答案 0 :(得分:2)
更改
url = base_url + link['href']
到
url.append(base_url + link['href'])