Question

我正在研究数据集并重新运行同事的代码。当标记文本数据时，下面显示的代码在我的Macbook上不起作用，但是在我的同事的计算机上运行良好，以下是代码。

我不知道他有哪个版本，但是我的是python3.6。是不同版本的问题吗？

s=title+' '+author+' '+text
 tokens=word_tokenize(s.decode('ascii','ignore').lower())

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-e50403f82604> in <module>
     10         flushPrint(m/100)#208
     11     s=title+' '+author+' '+text
---> 12     tokens=word_tokenize(s.decode('ascii','ignore').lower())
     13     tokens = [z for z in tokens if not z in stopset and len(z)>1]
     14     k=[]

AttributeError: 'str' object has no attribute 'decode'

Answer 1

问题很可能是由于python2和python3之间的变化

在python2中

''的类型为str，因此支持''.decode()
u''的类型为unicode，因此支持u''.encode()

在python3中，这是相反的

''的类型为unicode，因此支持''.encode()
u''的类型为byte，因此支持u''.decode()

因此，在您的情况下，根据变量的类型，您可能必须执行类似

的操作

s = title + b' ' + author + b' ' + text

只求助于python 2：）

为什么我无法将解码功能实现为字符串？

1 个答案: