Question

所以我遇到了一个编码问题，源于在Python中将字典写成csv。

以下是一个示例代码：

import csv

some_list = ['jalape\xc3\xb1o']

with open('test_encode_output.csv', 'wb') as csvfile:
    output_file = csv.writer(csvfile)
    for item in some_list:
        output_file.writerow([item])

这很好用，并且给了我一个写有“jalapeo”的csv文件。

但是，当我创建一个字典列表，其值包含这样的UTF-8字符......

import csv

some_list = [{'main': ['4 dried ancho chile peppers, stems, veins
            and seeds removed']}, {'main': ['2 jalape\xc3\xb1o 
            peppers, seeded and chopped', '1 dash salt']}]

with open('test_encode_output.csv', 'wb') as csvfile:
    output_file = csv.writer(csvfile)
    for item in some_list:
        output_file.writerow([item])

我只是获得一个包含2行的csv文件，其中包含以下条目：

{'main': ['4 dried ancho chile peppers, stems, veins and seeds removed']}
{'main': ['2 jalape\xc3\xb1o peppers, seeded and chopped', '1 dash salt']}

我知道我的东西是用正确的编码编写的，但因为它们不是字符串，当它们由csv.writer写出时，它们按原样编写。这令人沮丧。我在这里搜索了一些类似的问题，人们已经提到过使用csv.DictWriter，但这对我来说效果不好，因为我的词典列表并不只是一键'main'。有些还有其他键，如'toppings'，'crust'等。不仅如此，我还在做更多的工作，最终的输出是按照数量，单位，成分格式化成分，所以我最终会得到一个像

这样的词典列表

[{'main': {'amount': ['4'], 'unit': [''], 
'ingredient': ['dried ancho chile peppers']}},
{'topping': {'amount': ['1'], 'unit': ['pump'], 
'ingredient': ['cool whip']}, 'filling': 
{'amount': ['2'], 'unit': ['cups'], 
'ingredient': ['strawberry jam']}}]

说真的，非常感谢任何帮助，否则我必须在LibreOffice中使用find和replace来修复所有这些\ x ** UTF-8编码。

谢谢！

Answer 1

您正在将字典写入CSV文件，而.writerow()期望列表具有在写入时变为字符串的奇异值。

不要写字典，这些都会变成字符串表示，正如您所发现的那样。

您需要确定如何将每个字典的键和/或值转换为列，其中每列是单个原语值。

例如，如果您只想编写main密钥（如果存在），请执行以下操作：

with open('test_encode_output.csv', 'wb') as csvfile:
    output_file = csv.writer(csvfile)
    for item in some_list:
        if 'main' in item:
            output_file.writerow(item['main'])

假设与'main'键关联的值始终是值列表。

如果您想使用Unicode值保留字典，那么您使用的是错误的工具。 CSV是一种平面数据格式，只是行和原始列。使用可以保留适量信息的工具。

对于包含字符串键，列表，数字和unicode文本的词典，您可以使用JSON，或者如果涉及更复杂和自定义的数据类型，则可以使用pickle。使用JSON时，做想要从字节字符串解码为Python Unicode值，或者总是使用UTF-8编码的字节字符串，或者说明json library应如何处理字符串编码您使用encoding关键字：

import json

with open('data.json', 'w') as jsonfile:
    json.dump(some_list, jsonfile, encoding='utf8')

因为JSON字符串始终是unicode值。 encoding的默认值为utf8，但为了清楚起见，我在此处添加了它。

再次加载数据：

with open('data.json', 'r') as jsonfile:
    some_list = json.load(jsonfile)

请注意，此将返回unicode字符串，不是编码为UTF8的字符串。

pickle module的工作方式大致相同，但数据格式不是人类可读的：

import pickle

# store
with open('data.pickle', 'wb') as pfile:
    pickle.dump(some_list, pfile)

# load
with open('data.pickle', 'rb') as pfile:
    some_list = pickle.load(pfile)

pickle会在您存储数据时完全。字节字符串保持字节字符串，unicode值将恢复为unicode。

Answer 2

正如您在输出中看到的那样，您已经使用了字典，因此如果您希望处理该字符串，则必须写下：

import csv

some_list = [{'main': ['4 dried ancho chile peppers, stems, veins', '\xc2\xa0\xc2\xa0\xc2\xa0 and seeds removed']}, {'main': ['2 jalape\xc3\xb1o peppers, seeded and chopped', '1 dash salt']}]

with open('test_encode_output.csv', 'wb') as csvfile:
    output_file = csv.writer(csvfile)
    for item in some_list:
        output_file.writerow(item['main'])  #so instead of [item], we use item['main']

据我所知，这可能不是您想要的代码，因为它限制您调用每个密钥主要部分，但至少现在可以处理它。

你可能想要更好地制定你想要做的事情，因为现在它并不是很清楚（至少对我而言）。例如，你想要一个csv文件，它在第一个单元格中给你main，然后4个干...

试图在Python中编写一个字典列表到csv，遇到编码问题

2 个答案: