def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
filename = '/Users/congminmin/Downloads/kg-temp.csv'
reader = unicode_csv_reader(open(filename))
out_filename = '/Users/congminmin/Downloads/kg-temp.out'
#writer = open(out_filename, "w", "utf-8")
for question, answer in reader:
print(question+ " " + json.loads(answer)[0]['content'])
#writer.write(question + " " + answer)
reader.close();
此代码在Python 2.7中有效。但它在Python 3.6中给出了错误消息:
Unresolved reference 'unicode'
如何使其适应Python 3.6?
答案 0 :(得分:1)
首先确保您的数据是str
,而不是字节串,并且只需使用csv.reader
,而无需进行解码即可。
data = utf8_data.decode('utf-8')
for row in csv.reader(data, dialect=csv.excel, ...):
# ...
答案 1 :(得分:0)
Python 3已经具有出色的unicode支持。每次以文本模式打开文件时,都可以使用特定的编码,也可以将其默认设置为UTF-8。在Python 3中,str
和unicode
之间不再存在差异。后者不存在,并且前者具有完整的unicode支持。由于根本不需要设置方法,因此极大地简化了您的工作。您可以遍历普通的csv.reader
。
作为附加说明,您应该始终在with
块中打开文件,以便在出现任何异常的情况下将其清除。另外,当区块结束时,您的文件将自动关闭:
with open(filename) as f: # The default mode is 'rt', with utf-8 encoding
for question, answer in csv.reader(f):
# Do your thing here. Both question and answer are normal strings
只有在确保每一行都包含2个元素的情况下,此方法才能正常工作。您最好做些类似的事情
with open(filename) as f: # The default mode is 'rt', with utf-8 encoding
for row in csv.reader(f):
if len(row) != 2:
continue # Or handle the anomaly by other means
question, answer = row
# Do your thing here as before