尝试抓取特定网页时,使用Python发送请求未返回任何内容

时间:2020-03-29 03:38:15

标签: python html python-requests

shoe = input('Shoe name: ')

URL = 'https://stockx.com/search?s='+shoe

page = requests.get(URL, headers= headers)

soup = BeautifulSoup(page.content, 'html.parser')

time.sleep(2) #this was to ensure the webpage was having enough time to load so that it wouldn't try to scrape a prematurely loaded website. 

test = soup.find(class_ = 'BrowseSearchDescription__SearchConfirmation-sc-1mt8qyd-1 dcjzxm')

print(test) #returns none
print(URL) #prings the URL (which is the correct URL of the website I'm attempting to scrape)

我知道我可以轻松地使用Selenium进行此操作,但是它加载chrome标签并导航到网页时效率很低。我正在尝试提高效率,而我最初的“原型”确实使用了Selenium,但是始终将其检测为机器人,并且我的整个代码都被验证码阻止了。我是在做错什么导致代码返回“ None”,还是该特定网页无法删除。如果需要,特定的URL为https://stockx.com/search?s=yeezy

1 个答案:

答案 0 :(得分:0)

我尝试了您的代码,这是结果。

代码

shoe = 'yeezy'
URL = 'https://stockx.com/search?s='+shoe
page = requests.get(URL)
soup = bs.BeautifulSoup(page.content, 'html.parser')

当我看到soup的内容时,便是结果。

结果

..
..

<div id="px-captcha">
</div>
<p> Access to this page has been denied because 
    we believe you are using automation tools to browse the website.</p>

..
..

是的,我想开发人员不希望该网站被废弃。