我正在尝试使用Python中的请求和Beautiful Soup 4从Zomato的一个页面中提取评论。我想将所请求页面的链接和提取的评论存储到一个csv文件中。
我的问题是我提取的评论没有存储到一个单元格中,而是分成多个单元格。如何将提取的评论存储到一个单元格中?
这是我的代码:
import time
from bs4 import BeautifulSoup
import requests
URL = "https://www.zomato.com/review/eQEygl"
time.sleep(2)
reviewPage = requests.get(URL, headers = {'user-agent': 'my-app/0.0.1'})
reviewSoup = BeautifulSoup(reviewPage.content,"html.parser")
reviewText = reviewSoup.find("div",{"class":"rev-text"})
textSoup = BeautifulSoup(str(reviewText), "html.parser")
reviewElem = [URL, ""]
for string in textSoup.stripped_strings:
reviewElem[1] += string
csv = open("out.csv", "w", encoding="utf-8")
csv.write("Link, Review\n")
row = reviewElem[0] + "," + reviewElem[1] + "\n"
csv.write(row)
csv.close()
答案 0 :(得分:0)
无需手动构建CSV字符串。手动执行此操作时,如果列值内有列分隔符(默认情况下为,
),则会将其解释为分隔符,而不是文字字符串,导致列值分散在多个列周围。
使用csv
模块和.writerow()
方法:
import csv
# ...
with open("out.csv", "w", encoding="utf-8") as csv_file:
writer = csv.writer(csv_file)
writer.writerow(["Link", "Review"])
writer.writerow(reviewElem)
答案 1 :(得分:0)
我认为问题是"
字符串中嵌入的逗号,因为它们是大多数CSV软件中的默认分隔符。以下内容通过将字符串的内容包装在import time
from bs4 import BeautifulSoup
import requests
URL = "https://www.zomato.com/review/eQEygl"
time.sleep(2)
reviewPage = requests.get(URL, headers = {'user-agent': 'my-app/0.0.1'})
reviewSoup = BeautifulSoup(reviewPage.content,"html.parser")
reviewText = reviewSoup.find("div",{"class":"rev-text"})
textSoup = BeautifulSoup(str(reviewText), "html.parser")
reviewElem = [URL, ""]
for string in textSoup.stripped_strings:
reviewElem[1] += string
csv = open("out.csv", "w", encoding="utf-8")
csv.write("Link, Review\n")
#row = reviewElem[0] + "," + reviewElem[1] + "\n"
row = reviewElem[0] + ',"{}"\n'.format(reviewElem[1]) # quote string 2
csv.write(row)
csv.close()
个字符中以表明它是一个单元格来避免此问题:
{{1}}