我的for / in循环似乎未请求每个URL的html页面。相反,我的循环只会选择要获取的最后一个网址。
我已经在互联网上四处查看,并使用了人们建议的for / in循环,但是由于某些原因,它不起作用,我不知道解决方法是什么。
# Beautiful Soup Functions
import requests
from bs4 import BeautifulSoup
#url's to goto
base_url = 'https://www.espn.com/soccer/league/_/name/'
url_list = ['esp.1','ita.1','eng.1']
#url loop
for url in url_list:
print(base_url+url)
r = requests.get(base_url+url)
soup = BeautifulSoup(r.text, 'lxml')
print(soup.title.string)
#loop through standings table and pull data
预期结果是for / in循环将转到每个url并拉回html代码,然后我可以执行我的其他代码(循环通过排名)以拉回每个html页面上的表格。但是,for / in循环不会迭代。它只会拉回最后一个串联项目的html页面,即eng.1。我真正不明白的是为什么打印(base_url + url)会打印出所有3个串联的url;为什么?但是,print(soup.title.string)仅表示已请求一个URL?
答案 0 :(得分:0)
请参见下文(工作代码)
# Beautiful Soup Functions
import requests
from bs4 import BeautifulSoup
#url's to goto
base_url = 'https://www.espn.com/soccer/league/_/name/'
url_list = ['esp.1','ita.1','eng.1']
data = {}
for url in url_list:
print(base_url+url)
r = requests.get(base_url+url)
soup = BeautifulSoup(r.text, 'lxml')
print(soup.title.string)
data[base_url+url] = soup
# now you can work with 'data'
输出
https://www.espn.com/soccer/league/_/name/esp.1
Spanish Primera División News, Stats, Scores - ESPN
https://www.espn.com/soccer/league/_/name/ita.1
Italian Serie A News, Stats, Scores - ESPN
https://www.espn.com/soccer/league/_/name/eng.1
English Premier League News, Stats, Scores - ESPN
答案 1 :(得分:0)
您可以创建一个空列表,并可以在循环中添加所需内容
X-AnchorMailbox:{{my_email}}
Accept:application/json
Authorization:Bearer {{token}}