我正在尝试创建没有标题的重复CSV。当我尝试这个时,我收到以下错误:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1895: invalid start byte.
我已在CSV
和Unicode
编码上阅读了py UTF-8
documentation并已实施。
但是,我的输出文件生成时没有数据。不知道我在这里做错了什么。
import csv
path = '/Users/johndoe/file.csv'
with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:
def unicode_csv(infile, outfile):
inputs = csv.reader(utf_8_encoder(infile))
output = csv.writer(outfile)
for index, row in enumerate(inputs):
yield [unicode(cell, 'utf-8') for cell in row]
if index == 0:
continue
output.writerow(row)
def utf_8_encoder(infile):
for line in infile:
yield line.encode('utf-8')
unicode_csv(infile, outfile)
答案 0 :(得分:7)
解决方案是简单地在
中包含两个附加参数with open(path, 'r') as infile:
这两个参数是encoding ='UTF-8'和errors ='ignore'。这允许我创建一个原始CSV的副本,没有标题,也没有UnicodeDecodeError。以下是完整的代码。
import csv
path = '/Users/johndoe/file.csv'
with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile:
inputs = csv.reader(infile)
output = csv.writer(outfile)
for index, row in enumerate(inputs):
# Create file with no header
if index == 0:
continue
output.writerow(row)
答案 1 :(得分:2)
自行
unicode_csv(infile,outfile)
不缩进,它超出了with
命令的范围,当它被调用时,infile和outfile都被关闭。
文件应该在使用时打开,而不是在定义函数时打开,所以有:
with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:
unicode_csv(infile,outfile)
答案 2 :(得分:1)
如果您能够使用熊猫,并且知道文件的确切编码,则可以尝试以下操作:
import pandas as pd
path = '/Users/johndoe/file.csv'
df = pd.read_csv(path, encoding='ISO-8859-1')
df.to_csv(path, encoding='utf-8', index=False)