如何正确地将字典写入CSV文件?我已将解析后的数据写入字典,我想在dict中的每个键的单独列中按键值写入数据,以及其中一个键值对(恰好是键'ff')
我想分组并分开5列。例如:
0,4,9,14... - in the first column
1,5,10,15 /-second ...etc.
问题是数据必须以utf-8编码保存,以便文件中的俄文字符能够正确显示。
以下是我的代码示例。现在一切都写成一列,我想用CSV生成一种价目表。
我正在使用Python 2.7
import requests
from bs4 import BeautifulSoup
import csv
import re
def get_html(url):
r = requests.get(url)
return r.text
url='http://www.autobody.ru/kuzovnoy-remont/'
html=get_html(url)
soup=BeautifulSoup(html, 'html.parser')
mydivs = soup.findAll('a',class_="banners_images")
urls=[]
for i in mydivs:
ur=(i.get('href'))
ur='http://www.autobody.ru'+str(ur)
urls.append(ur)
#head =[]
#headers = soup.findAll('h1')
#head.append(headers[0].text.strip())
images=[]
heads =[]
artic=[]
atrib=[]
price=[]
for i in urls:
html=get_html(i)
soup=BeautifulSoup(html, 'html.parser')
head = soup.find('h1').get_text()
heads.append(head )
image=[x['src'] for x in soup.findAll('img', {'class': 'detimg'})]
image1='http://www.autobody.ru'+image[0]
images.append(image1)
price1 = soup.find('div', class_='price').get_text()
price1=re.sub(r"c",r"p", price1)
price.append(price1)
for tr in soup.find('table', class_='tech').find_all('tr'):
artic.append(tr.get_text())
da={'titles': heads,'texts':price,'ff':artic,'images':images}
with open('c:\\1\\121.csv','a') as f:
f.write(u'\ufeff'.encode('utf8')) # writes "byte order mark" UTF-8 signature
writer=csv.writer(f)
for i in da:
for rows in da[i]:
writer.writerow([rows.encode('utf8')])
答案 0 :(得分:1)
您需要使用DictWriter:
为列名创建键:
keys = mydict.keys()
或仅手动:
keys = ["column1", "columns2"]
将数据写入CSV:
with open(file_name, 'a', encoding="utf-8") as output_file:
dict_writer = csv.DictWriter(output_file, keys, delimiter=',', lineterminator='\n')
dict_writer.writeheader()
dict_writer.writerows([mydict])
答案 1 :(得分:1)
您已创建了一个普通的CSV编写器,但正在尝试将您的数据转换为字典并进行编写。您可以使用字典编写器,但我觉得避免尝试使用字典并将数据转换为格式正确的列表会更有意义。
目前,您正在构建列中的所有数据,但需要以行形式编写此数据。行/列交换可以使用zip(*[col1, col2, col3])
完成。此外,在您进行编码时对数据进行编码也是有意义的:
import requests
from bs4 import BeautifulSoup
import csv
import re
def get_html(url):
r = requests.get(url)
return r.text
url = 'http://www.autobody.ru/kuzovnoy-remont/'
html = get_html(url)
soup = BeautifulSoup(html, 'html.parser')
mydivs = soup.findAll('a',class_="banners_images")
urls = []
for i in mydivs:
ur = (i.get('href'))
ur = 'http://www.autobody.ru' + str(ur)
urls.append(ur)
images = []
heads = []
artic = []
atrib = []
price = []
with open('121.csv', 'wb') as f: # Open the file in binary mode for Python 2.x
f.write(u'\ufeff'.encode('utf8')) # writes "byte order mark" UTF-8 signature
writer = csv.writer(f)
for i in urls:
html = get_html(i)
soup = BeautifulSoup(html, 'html.parser')
head = soup.find('h1').get_text()
heads.append(head.encode('utf8'))
image = [x['src'] for x in soup.findAll('img', {'class': 'detimg'})]
image1 = 'http://www.autobody.ru'+image[0]
images.append(image1.encode('utf8'))
price1 = soup.find('div', class_='price').get_text()
price1 = re.sub(r"c",r"p", price1)
price.append(price1.encode('utf8'))
for tr in soup.find('table', class_='tech').find_all('tr'):
artic.append(tr.get_text().strip().encode('utf8'))
writer.writerows(zip(*[heads, price, artic, images]))
这将为您提供一个输出文件:
CIVIC РУЧКА ПЕРЕД ДВЕРИ ЛЕВ ВНЕШН ЧЕРН,295 p,"Артикул
HDCVC96-500B-L",http://www.autobody.ru/upload/images/HDCVC96-500B-L.jpg.pagespeed.ce.JnqIICpcSq.jpg
CIVIC РУЧКА ПЕРЕД ДВЕРИ ЛЕВ ВНЕШН ЧЕРН,295 p,"Артикул
HDCVC96-500B-L",http://www.autobody.ru/upload/images/HDCVC96-500B-L.jpg.pagespeed.ce.JnqIICpcSq.jpg
AUDI A4 БАМПЕР ПЕРЕДН ГРУНТ,3882 p,"ОЕМ#
72180S04003",http://www.autobody.ru/upload/images/AI0A401-160X.jpg.pagespeed.ce.onSZWY1J15.jpg