我是python初学者,我需要抓取餐厅名称,社会经济地位,名称客户,评价日期,滴定度评价以及仅10家至40页的一家餐厅(python3.7和漂亮的汤)的评价。但是,当我打开csv文件时,我仅拥有第一审稿人的所有信息。这是我的代码:
csv_file = open("lebouclard.csv", "w", encoding="utf-8")
csv_writer = csv.writer(csv_file, delimiter = ";")
csv_writer.writerow(["inf_rest_name", "rest_eclf", "name_client", "date_rev_cl", "titre_rev_cl", "opinion_cl"])
for i in range(10,40):
url = requests.get("https://www.tripadvisor.fr/Restaurant_Review-g187147-d947475-Reviews-or10-Le_Bouclard-Paris_Ile_de_France.html".format(i)).text
page_soup = soup(url, "html.parser")
gen_rest = page_soup.find_all("div", {"class":"page"})
for rest in gen_rest:
rname= rest.find("h1",{"class":"ui_header h1"})
inf_rest_name = rname.text
print("inf_rest_name: " + inf_rest_name)
econ_class_food = rest.find("div", {"class":"header_links"})
rest_eclf = econ_class_food.text.strip()
print("rest_eclf: " + rest_eclf)
for clients in gen_rest:
client_info = clients.find_all("div", {"class":"info_text"})
name_client = client_info[0].text
print("name_client: " + name_client)
date_review = clients.find_all("span", {"class":"ratingDate"})
date_rev_cl = date_review[0].text.strip()
print("date_rev_cl: " + date_rev_cl)
titre_review = clients.find_all("span", {"class":"noQuotes"})
titre_rev_cl = titre_review[0].text.strip()
print("titre_rev_cl: " + titre_rev_cl)
opinion = clients.find_all("p", {"class":"partial_entry"})
opinion_cl = opinion[0].text.replace("\n","")
print("opinion_cl: " + opinion_cl)
csv_writer.writerow([inf_rest_name, rest_eclf, name_client, date_rev_cl, titre_rev_cl, opinion_cl])
csv_file.close()
我试图在gen_rest中消除for客户端,并放置:
client_info = rest.find_all("div", {"class":"info_text"})
name_client = client_info[0].text
print("name_client: " + name_client)
date_review = rest.find_all("span", {"class":"ratingDate"})
date_rev_cl = date_review[0].text.strip()
print("date_rev_cl: " + date_rev_cl)
titre_review = rest.find_all("span", {"class":"noQuotes"})
titre_rev_cl = titre_review[0].text.strip()
print("titre_rev_cl: " + titre_rev_cl)
opinion = rest.find_all("p", {"class":"partial_entry"})
opinion_cl = opinion[0].text.replace("\n","")
print("opinion_cl: " + opinion_cl)
但是它向我显示了scv文件中的相同信息。在我决定消除find_all和[0]之后,结果却是相同的。我想念的是什么?...我已经阅读了其他有关此问题,但没有找到我的错误。
答案 0 :(得分:0)
在使用f字符串的地方尝试以下操作,以便在循环期间将下一组评论的值传递到字符串中
import requests, csv
from bs4 import BeautifulSoup as bs
with open("lebouclard.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
w = csv.writer(csv_file, delimiter = ";", quoting=csv.QUOTE_MINIMAL)
w.writerow(["inf_rest_name", "rest_eclf", "name_client", "date_rev_cl", "titre_rev_cl", "opinion_cl"])
with requests.Session() as s:
for offset in range(0,40,10):
url = f'https://www.tripadvisor.fr/Restaurant_Review-g187147-d947475-Reviews-or{offset}-Le_Bouclard-Paris_Ile_de_France.html'
r = s.get(url)
soup = bs(r.content, 'lxml')
if not offset:
inf_rest_name = soup.select_one('.heading').text.replace("\n","").strip()
rest_eclf = soup.select_one('.header_links a').text.strip()
for review in soup.select('.reviewSelector'):
name_client = review.select_one('.info_text > div:first-child').text.strip()
date_rev_cl = review.select_one('.ratingDate')['title'].strip()
titre_rev_cl = review.select_one('.noQuotes').text.strip()
opinion_cl = review.select_one('.partial_entry').text.replace("\n","").strip()
row = [f"{inf_rest_name}", f"{rest_eclf}", f"{name_client}", f"{date_rev_cl}" , f"{titre_rev_cl}", f"{opinion_cl}"]
w.writerow(row)
对于我的设置,为了使其正常工作,我必须将定界符设置为“,”而不是“;”
结果样本: