环境: - Mac OS Yosemite - Python 2.7 - 我正在阅读的文件以txt格式保存
所以我有一个脚本将中文文本分成句子,下面是代码:
# coding: utf-8
cutlist ="。!?".decode('utf-8')
def FindToken(cutlist, char):
if char in cutlist:
return True
else:
return False
def Cut(cutlist,lines):
l = []
line = []
for i in lines:
if FindToken(cutlist,i):
line.append(i)
l.append(''.join(line))
line = [] =
else:
line.append(i)
return l
for lines in file("t.txt"):
l = Cut(list(cutlist),list(lines.decode('gbk')))
for line in l:
if line.strip() !="":
li = line.strip().split()
for sentence in li:
print sentence
有人可以就导致此错误的原因向我提供一些指导吗?谢谢!
答案 0 :(得分:0)
所以我将解码更改为utf-8如下:
l = Cut(list(cutlist),list(lines.decode('utf-8')))
它现在有效。