BeautifulSoup Find定期返回无

时间:2019-12-01 01:08:13

标签: python beautifulsoup

我正在尝试从课堂上获得价值。有时,find会返回我需要的值,但是下一次它不再起作用。

代码:

import requests
from bs4 import BeautifulSoup

url = 'https://beru.ru/catalog/molotyi-kofe/76321/list'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
                         '(KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

item_count = (soup.find('div', class_='_2StYqKhlBr')).text.split()[4]

print(item_count)

1 个答案:

答案 0 :(得分:0)

有时(有时)没有获得值的原因。那是因为该网站受CAPTCHA

保护

因此,当请求被CAPTCHA阻止时

它变得像下面这样:

https://beru.ru/showcaptcha?retpath=https://beru.ru/catalog/molotyi-kofe/76321/list?ncrnd=4561_aa1b86c2ca77ae2b0831c4d95b9d85a4&t=0/1575204790/b39289ef083d539e2a4630548592a778&s=7e77bfda14c97f6fad34a8a654d9cd16

您可以通过解析响应内容来验证:

import requests
from bs4 import BeautifulSoup


r = requests.get(
    'https://beru.ru/catalog/molotyi-kofe/76321/list')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('div', attrs={'class': '_2StYqKhlBr _1wAXjGKtqe'}):
    print(item)

for item in soup.findAll('div', attrs={'class': 'captcha__image'}):
    for captcha in item.findAll('img'):
        print(captcha.get('src'))

您将获得CAPTCHA图片链接:

https://beru.ru/captchaimg?aHR0cHM6Ly9leHQuY2FwdGNoYS55YW5kZXgubmV0L2ltYWdlP2tleT0wMEFMQldoTnlaVGh3T21WRmN4NWFJRUdYeWp2TVZrUCZzZXJ2aWNlPW1hcmtldGJsdWU,_0/1575206667/b49556a86deeece9765a88f635c7bef2_df12d7a36f0e2d36bd9c9d94d8d9e3d7