对于文本分析的项目工作,我试图抓一些评论。我正在使用蟒蛇和美丽的汤来完成这项工作。我没有收到任何错误,但也没有得到任何数据。我确定我在指定div标签时犯了错误。有人可以帮忙吗?以下是我使用的代码:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.zomato.com/brewbot")
soup = BeautifulSoup(r.content)
links = soup.find.all("div")
k_data = soup.find_all({"class":"rev-text"})
for item in k_data:
print item.text
我更改了“class”:“rev-text”改为“tabindex ='0'”,“class” - “rev.text”,包括“itemprop”=“description”,以及其他组合......似乎没什么用。有人可以帮忙吗?
答案 0 :(得分:2)
从对social_load_more.php
端点的POST请求的响应中动态加载评论。在代码中模拟,使用JSON响应中的评论获取HTML并使用BeautifulSoup
进行解析。完整的工作代码:
import requests
from bs4 import BeautifulSoup
with requests.Session() as session:
session.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"}
r = session.get("https://www.zomato.com/brewbot")
soup = BeautifulSoup(r.content, "html.parser")
itemid = soup.body["itemid"]
# get reviews
r = session.post("https://www.zomato.com/php/social_load_more.php", data={
"entity_id": itemid,
"profile_action": "reviews-top",
"page": "0",
"limit": "5"
})
reviews = r.json()["html"]
soup = BeautifulSoup(reviews, "html.parser")
k_data = soup.select("div.rev-text")
for item in k_data:
print(item.get_text())