我正在尝试从课堂上获得价值。有时,find会返回我需要的值,但是下一次它不再起作用。
代码:
import requests
from bs4 import BeautifulSoup
url = 'https://beru.ru/catalog/molotyi-kofe/76321/list'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
item_count = (soup.find('div', class_='_2StYqKhlBr')).text.split()[4]
print(item_count)
答案 0 :(得分:0)
有时(有时)没有获得值的原因。那是因为该网站受CAPTCHA
因此,当请求被CAPTCHA
阻止时
它变得像下面这样:
https://beru.ru/showcaptcha?retpath=https://beru.ru/catalog/molotyi-kofe/76321/list?ncrnd=4561_aa1b86c2ca77ae2b0831c4d95b9d85a4&t=0/1575204790/b39289ef083d539e2a4630548592a778&s=7e77bfda14c97f6fad34a8a654d9cd16
您可以通过解析响应内容来验证:
import requests
from bs4 import BeautifulSoup
r = requests.get(
'https://beru.ru/catalog/molotyi-kofe/76321/list')
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll('div', attrs={'class': '_2StYqKhlBr _1wAXjGKtqe'}):
print(item)
for item in soup.findAll('div', attrs={'class': 'captcha__image'}):
for captcha in item.findAll('img'):
print(captcha.get('src'))
您将获得CAPTCHA
图片链接:
https://beru.ru/captchaimg?aHR0cHM6Ly9leHQuY2FwdGNoYS55YW5kZXgubmV0L2ltYWdlP2tleT0wMEFMQldoTnlaVGh3T21WRmN4NWFJRUdYeWp2TVZrUCZzZXJ2aWNlPW1hcmtldGJsdWU,_0/1575206667/b49556a86deeece9765a88f635c7bef2_df12d7a36f0e2d36bd9c9d94d8d9e3d7