使用IronPython中的编解码器读取UTF-8文件

时间:2012-04-12 12:10:27

标签: python encoding csv utf-8 ironpython

我有一个以UTF-8编码的.csv文件,其中包含拉丁文和西里尔符号。

;F1;F2;abcdefg3;F200
;ABSOLUTE;NOMINAL;NOMINAL;NOMINAL
o1;1;USA;Новосибирск;1223

我正在尝试在IronPython 2.7.1中执行以下脚本:

import codecs

f = codecs.open(r"file.csv", "rb", "utf-8")
f.next()

在执行f.next()期间,会发生异常:

Traceback (most recent call last):
  File "c:\Program Files\Microsoft Visual Studio 10.0\Common7\IDE\Extensions\Microsoft\Python Tools for Visual Studio\1.1\visualstudio_py_repl.py", line 492, in run_file_as_main
    code.Execute(self.exec_mod)
  File "<string>", line 4, in <module>
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 684, in next
    return self.reader.next()
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 615, in next
    line = self.readline()
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 530, in readline
    data = self.read(readsize, firstline=True)
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')

在CPython 2.7的同时,脚本可以正常工作。同样在IronPython 2.7.1下面的脚本工作正常:

import codecs

f = codecs.open(r"file.csv", "rb", "utf-8")
f.readlines()

有人知道可能导致这种奇怪行为的原因吗?

2 个答案:

答案 0 :(得分:2)

看起来它可能是next()处理编解码器的错误。你能否open an issue请复制附件?

答案 1 :(得分:0)

可能是“rb”参数出现问题,请尝试使用“r”

f = codecs.open(r"file.csv", "r", "utf-8")