网页抓取客户评论

时间:2021-04-24 00:13:24

标签: python web web-scraping

我正在尝试从 G2 抓取客户评论,作为我工作项目的一部分,但收到 403 错误,关于如何进行此操作的任何想法,我对网络抓取还很陌生。非常感谢!

HTTPError:HTTP 错误 403:禁止

from urllib.request import Request, urlopen

req = Request("https://www.g2.com/products/google-drive/reviews", headers={'User-Agent': 'Mozilla/5.0'})

web_byte = urlopen(req).read()

webpage = web_byte.decode('utf-8')

parsed_html = BeautifulSoup(webpage, features="lxml")

1 个答案:

答案 0 :(得分:2)

另一种方法:

from bs4 import BeautifulSoup
import requests

url = "https://www.g2.com/products/google-drive/reviews"
req = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
html = req.text

parsed_html = BeautifulSoup(html, features="lxml")
print(parsed_html)

问题是此网络会阻止您的请求观看此answer。检查我写的代码的输出,你会看到:

<title>Access denied | www.g2.com used Cloudflare to restrict access</title>

PS:你的方法没问题,403 错误是禁止通知。