将ISO-8859-1转换为utf-8(øæå)

时间:2014-09-30 09:23:45

标签: python csv unicode latin1

我有一个包含字母('øæå')的txt文档,我希望这个脚本能够识别这些字母并将它们正确地写入csv文件。

with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
    for line in file:

        line = file.readline() 
        lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
        splitTab = lineS.split(';')

        for s in splitTab:
            newS = s[1:-1]

        date = splitTab[0].replace('.', '/')
        insertList = [date,]
        out.writerow(date)

给出:

  File "Q:\DropBox\Development\Scripts\tes2.py", line 17, in <module>
    lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 14: invalid start byte

1 个答案:

答案 0 :(得分:0)

with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
    for line in file:

        line = file.readline() 
        lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
        splitTab = lineS.split(';')

删除line = file.readline(),您已经在使用for line in file构造的行中迭代(阅读)。

lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')

不会是您想要的,因为它编码为ISO-8859-1,然后尝试解码ISO-8859-1,就像它是UTF-8一样。如果你想转换ISO-8859-1&#39;对于UTF-8,你通常想做

 lineS = line.decode('ISO-8859-1', 'ignore').encode('utf-8')

但是,您已经转换了ISO-8859-1&#39; ISO-8859-1&#39; (对于unicode)在codecs.open()表达式中。所以你只需要做

  lineS = = line.encode('utf-8')