我对 Python - WebScraping 非常陌生,我想从网站中提取文本并导出到 csv 文件, 但是我在检查 csv 文件时遇到了问题, 当我运行此代码(带打印)时:
import requests
from bs4 import BeautifulSoup
import csv
URL = "https://intanseafood.com/demersal-fish"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
quotes=[]
table = soup.find('div', attrs = {'id':'archive-product'})
for row in table.findAll('div',
attrs = {'class':'product-h2'}):
quote = {}
quote['product'] = print(row.get_text())
quotes.append(quote)
结果:
Fish Goldband Snapper Natural Cut
Fish Grouper Portion
Fish Ruby Snaper Natural Cut
Fish Croaker
Fish Grouper WGGS
Fish Pinjalo Snapper Natural Cut
Fish Parrotfish WGGS
Fish Snapper One Cut
但是当我将其更改为此代码(导出到 csv)时:
import requests
from bs4 import BeautifulSoup
import csv
URL = "https://intanseafood.com/demersal-fish"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
quotes=[]
table = soup.find('div', attrs = {'id':'archive-product'})
for row in table.findAll('div',
attrs = {'class':'product-h2'}):
quote = {}
quote['product'] = row.get_text()
quotes.append(quote)
filename = 'demersal.csv'
with open(filename, 'w', newline='') as f:
w = csv.DictWriter(f,['product'])
w.writeheader()
for quote in quotes:
w.writerow(quote)
文件 csv 已创建,但除标题外没有任何内容。请任何人帮我解决这个问题,提前致谢
答案 0 :(得分:0)
您的第一个输出中有很多空格,这意味着字符串中有制表符/空格/新行。做一点挖掘表明它是一个换行符和制表符。删除它们,例如:
text = row.get_text()
quote['product'] = text.replace("\t", "").replace("\n","")