我有一个包含字母('øæå')的txt文档,我希望这个脚本能够识别这些字母并将它们正确地写入csv文件。
with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
for line in file:
line = file.readline()
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
splitTab = lineS.split(';')
for s in splitTab:
newS = s[1:-1]
date = splitTab[0].replace('.', '/')
insertList = [date,]
out.writerow(date)
给出:
File "Q:\DropBox\Development\Scripts\tes2.py", line 17, in <module>
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 14: invalid start byte
答案 0 :(得分:0)
with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
for line in file:
line = file.readline()
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
splitTab = lineS.split(';')
删除line = file.readline()
,您已经在使用for line in file
构造的行中迭代(阅读)。
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
不会是您想要的,因为它编码为ISO-8859-1,然后尝试解码ISO-8859-1,就像它是UTF-8一样。如果你想转换ISO-8859-1&#39;对于UTF-8,你通常想做
lineS = line.decode('ISO-8859-1', 'ignore').encode('utf-8')
但是,您已经转换了ISO-8859-1&#39; ISO-8859-1&#39; (对于unicode)在codecs.open()表达式中。所以你只需要做
lineS = = line.encode('utf-8')