网络抓取Adidas不会返回html文本

时间:2020-08-07 04:11:54

标签: python html python-3.x web-scraping python-requests

我正在尝试使用以下代码从Adidas网上抓取鞋子:

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

req = Request('https://www.adidas.com/us/men-shoes', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage)
print(webpage)

由于某种原因,尽管该代码似乎适用于其他URL(例如“ http://www.python.org”),但似乎无法检索该页面的html。这可能是安全问题吗?如果是这样,我如何从现场刮擦鞋子?

我似乎没有收到任何错误或回应。该代码似乎无限期地运行。

1 个答案:

答案 0 :(得分:3)

当我使用与请求标头中相同的User-Agent时,以下代码对我有用:

from bs4 import BeautifulSoup

hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
html_page = requests.get("https://www.adidas.com/us/men-shoes", headers=hdr, timeout=15)

soup = BeautifulSoup(html_page.content, 'html.parser')
soup

Sample File input and output Screenshots 1