Question

shoe = input('Shoe name: ')

URL = 'https://stockx.com/search?s='+shoe

page = requests.get(URL, headers= headers)

soup = BeautifulSoup(page.content, 'html.parser')

time.sleep(2) #this was to ensure the webpage was having enough time to load so that it wouldn't try to scrape a prematurely loaded website. 

test = soup.find(class_ = 'BrowseSearchDescription__SearchConfirmation-sc-1mt8qyd-1 dcjzxm')

print(test) #returns none
print(URL) #prings the URL (which is the correct URL of the website I'm attempting to scrape)

我知道我可以轻松地使用Selenium进行此操作，但是它加载chrome标签并导航到网页时效率很低。我正在尝试提高效率，而我最初的“原型”确实使用了Selenium，但是始终将其检测为机器人，并且我的整个代码都被验证码阻止了。我是在做错什么导致代码返回“ None”，还是该特定网页无法删除。如果需要，特定的URL为https://stockx.com/search?s=yeezy

Answer 1

我尝试了您的代码，这是结果。

代码

shoe = 'yeezy'
URL = 'https://stockx.com/search?s='+shoe
page = requests.get(URL)
soup = bs.BeautifulSoup(page.content, 'html.parser')

当我看到soup的内容时，便是结果。

结果

..
..

<div id="px-captcha">
</div>
<p> Access to this page has been denied because 
    we believe you are using automation tools to browse the website.</p>

..
..

是的，我想开发人员不希望该网站被废弃。

尝试抓取特定网页时，使用Python发送请求未返回任何内容

1 个答案: