Question

base_url = 'http://www.baseball-reference.com' # base url for concatenation
data = requests.get("http://www.baseball-reference.com/teams/BAL/2014-schedule-scores.shtml")
soup = BeautifulSoup(data.content)

url = []
for link in soup.find_all('a'):

    if not link.has_attr('href'):
        continue

    if link.get_text() != 'boxscore':
        continue

    url = base_url + link['href']

打印网址时，这就是我得到的。

http://www.baseball-reference.com/boxes/BAL/BAL201403310.shtml

http://www.baseball-reference.com/boxes/BAL/BAL201404020.shtml

http://www.baseball-reference.com/boxes/BAL/BAL201404030.shtml

http://www.baseball-reference.com/boxes/DET/DET201404040.shtml ......

http://www.baseball-reference.com/boxes/KCA/KCA201410150.shtml

如果我遍历这个，它给了我单个元素，我理解为什么，但我需要制作一个向量，每个元素是每个盒子分数的完整URL。做这个的最好方式是什么？我应该追加前62个元素，然后是62个，依此类推。我不确定最好的方法是什么。

Answer 1

更改

url = base_url + link['href']

到

url.append(base_url + link['href'])

从Python中的无格式元素制作矢量

1 个答案: