读取然后写入带有“非ISO扩展ASCII”文本编码的CSV

时间:2018-10-09 16:31:13

标签: python python-3.x

我的csv具有如下字符串:

TîezÑnmidnan

我正在尝试使用以下内容来设置读取器/写入器

import csv

# File that will be written to
csv_output_file = open(file, 'w', encoding='utf-8')
# File that will be read in
csv_file = open(filename, encoding='utf-8', errors='ignore')
# Define reader
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"')
# Define writer
csv_writer = csv.writer(csv_output_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

然后遍历读入的信息

# Iterate over the rows in the csv
for idx, row in enumerate(csv_reader):
    csv_writer.writerow(row[0:30])

问题出在我的输出文件中,我无法使用相同的字符串来显示它。根据我的mac,csv文件类型的编码为“非ISO扩展ASCII”

我尝试了各种编码,有些会删除特殊字符,而有些则无法工作。

这很奇怪,因为我可以将上面的字符串硬编码为一个变量,并且可以毫无问题地使用它,因此我认为这与我在文件中的读取方式有关。如果我在写断点之前将其显示在调试器中,如下所示。

T�ez�nmidnan

我无法在运行文件之前对其进行转换,因此python代码必须自行处理所有转换。

我想要的预期输出将是它保留在输出文件中的样子,

TîezÑnmidnan

添加指向示例csv的链接,以显示问题以及我的代码的完整版本(已删除一些详细信息)

Example file to run with this

import tkinter as tk
from tkinter.filedialog import askopenfilename
import csv
import os

root = tk.Tk()
root.withdraw()

# Ask for file
filename = os.path.abspath(askopenfilename(initialdir="/", title="Select csv file", filetypes=(("CSV Files", "*.csv"),)))
# Set output file name
output_name = filename.rsplit('.')
del output_name[len(output_name) - 1]
output_name = "".join(output_name)
output_name += "_processed.csv"
# Using the file that will be written to
csv_output_file = open(os.path.abspath(output_name), 'w', encoding='utf-8')
# Using the file is be read in
csv_file = open(filename, encoding='utf-8', errors='ignore')
# Define reader with , delimiter
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"')
# Define writer to put quotes around input values with a comma in them
csv_writer = csv.writer(csv_output_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

header_row = []
# Iterate over the rows in the csv
for idx, row in enumerate(csv_reader):
    if idx != 0:
        csv_writer.writerow(row)
    else:
        header_row = row
        csv_writer.writerow(header_row)
csv_file.flush()
csv_output_file.flush()
csv_file.close()
csv_output_file.close()

预期结果

Header1,Header2
Value1,TîezÑnmidnan

实际结果

Header1,Header2
Value1,Teznmidnan

编辑: chardetect给我“信心满满的utf-8 0.99”

0 个答案:

没有答案