shoe = input('Shoe name: ')
URL = 'https://stockx.com/search?s='+shoe
page = requests.get(URL, headers= headers)
soup = BeautifulSoup(page.content, 'html.parser')
time.sleep(2) #this was to ensure the webpage was having enough time to load so that it wouldn't try to scrape a prematurely loaded website.
test = soup.find(class_ = 'BrowseSearchDescription__SearchConfirmation-sc-1mt8qyd-1 dcjzxm')
print(test) #returns none
print(URL) #prings the URL (which is the correct URL of the website I'm attempting to scrape)
我知道我可以轻松地使用Selenium进行此操作,但是它加载chrome标签并导航到网页时效率很低。我正在尝试提高效率,而我最初的“原型”确实使用了Selenium,但是始终将其检测为机器人,并且我的整个代码都被验证码阻止了。我是在做错什么导致代码返回“ None”,还是该特定网页无法删除。如果需要,特定的URL为https://stockx.com/search?s=yeezy
答案 0 :(得分:0)
我尝试了您的代码,这是结果。
代码
shoe = 'yeezy'
URL = 'https://stockx.com/search?s='+shoe
page = requests.get(URL)
soup = bs.BeautifulSoup(page.content, 'html.parser')
当我看到soup
的内容时,便是结果。
结果
..
..
<div id="px-captcha">
</div>
<p> Access to this page has been denied because
we believe you are using automation tools to browse the website.</p>
..
..
是的,我想开发人员不希望该网站被废弃。