Question

我有一些带有unicode数据的文件，以下代码在使用CPython读取这些文件时工作正常，而代码在IronPython上崩溃说“无法在索引67处解码字节”

for f in self.list_of_files:
            all_words_in_file = []

            with codecs.open(f,encoding="utf-8-sig") as file_obj:
                for line in file_obj:
                    all_words_in_file.extend(line.split(" "))

            #print "Normalising unicode strings"

            normal_list = []
            #gets all the words and remove duplicate words 
            #the list will contain unique normalized words
            for l in all_words_in_file:
                    normal_list.append(normalize('NFKC',l))

            file_listing.update({f:normal_list})
        return file_listing

我无法理解原因，是否有其他方法可以在ironpython中读取unicode数据？

Answer 1

这个怎么样：

def lines(filename):
    f = open(filename, "rb")
    yield f.readline()[3:].strip().decode("utf-8")
    for line in f:
        yield line.strip().decode("utf-8")
    f.close()

for line in lines("text-utf8-with-bom.txt"):
    all_words_in_file.extend(line.split(" "))

我还提交了一个IronPython错误https://ironpython.codeplex.com/workitem/34951

只要您输入整行进行解码，一切都会好的。

无法解码字节 - IronPython

1 个答案: