我有一些带有unicode数据的文件,以下代码在使用CPython读取这些文件时工作正常,而代码在IronPython上崩溃说“无法在索引67处解码字节”
for f in self.list_of_files:
all_words_in_file = []
with codecs.open(f,encoding="utf-8-sig") as file_obj:
for line in file_obj:
all_words_in_file.extend(line.split(" "))
#print "Normalising unicode strings"
normal_list = []
#gets all the words and remove duplicate words
#the list will contain unique normalized words
for l in all_words_in_file:
normal_list.append(normalize('NFKC',l))
file_listing.update({f:normal_list})
return file_listing
我无法理解原因,是否有其他方法可以在ironpython中读取unicode数据?
答案 0 :(得分:0)
这个怎么样:
def lines(filename):
f = open(filename, "rb")
yield f.readline()[3:].strip().decode("utf-8")
for line in f:
yield line.strip().decode("utf-8")
f.close()
for line in lines("text-utf8-with-bom.txt"):
all_words_in_file.extend(line.split(" "))
我还提交了一个IronPython错误https://ironpython.codeplex.com/workitem/34951
只要您输入整行进行解码,一切都会好的。