UnicodeDecodeError:'gbk'编解码器无法解码中文字节

时间:2015-10-22 21:04:04

标签: python unicode decode splitter

环境: - Mac OS Yosemite - Python 2.7 - 我正在阅读的文件以txt格式保存

所以我有一个脚本将中文文本分成句子,下面是代码:

# coding: utf-8 

cutlist ="。!?".decode('utf-8')    
def FindToken(cutlist, char):
    if char in cutlist:
        return True
    else:
        return False


def Cut(cutlist,lines):          
    l = []         
    line = []   

    for i in lines:         
        if FindToken(cutlist,i):      
            line.append(i)         
            l.append(''.join(line))   
            line = []  =
        else:         
            line.append(i)     
    return l


for lines in file("t.txt"):    
    l = Cut(list(cutlist),list(lines.decode('gbk')))     
    for line in l:  
       if line.strip() !="":      
            li = line.strip().split()   
            for sentence in li:
                print sentence

但是我收到以下错误: enter image description here

有人可以就导致此错误的原因向我提供一些指导吗?谢谢!

1 个答案:

答案 0 :(得分:0)

所以我将解码更改为utf-8如下:

l = Cut(list(cutlist),list(lines.decode('utf-8')))  

它现在有效。