Question

嗨，我在下面编写了代码以进行情感分析：

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

import time
analyzer = SentimentIntensityAnalyzer()

pos_count = 0
pos_correct = 0

with open('EVG_text mining.txt', mode='rb') as f: 
    bytes = f.read()
    text = bytes.decode('utf-8', 'ignore') 
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if not vs['neg'] > 0.1:
            if vs['pos']-vs['neg'] > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open('EVG_text mining.txt', mode='rb') as f: 
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if not vs['pos'] > 0.1:
            if vs['pos']-vs['neg'] <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))

但是，我收到一个错误：

Traceback (most recent call last):
  File "<ipython-input-9-62462b5174b4>", line 12, in <module>
    for line in f.read().split('\n'):
TypeError: a bytes-like object is required, not 'str'

我该如何解决？

Answer 1

您有一个以二进制模式打开的文件。读取时返回bytes，而不是str。

此行：

bytes = f.read()

将整个文件读入名为bytes的变量（不要那样做，python已经有一个名为bytes的函数，通过使用此名称，您将成为<< em> shadowing “内置函数）。

然后您继续解码字节：

text = bytes.decode('utf-8', 'ignore')

但是您最终还是再次读取了文件！

for line in f.read().split('\n'):

由于已经读取了文件，因此返回空字节串（b''）并在其中调用.split()会导致您看到错误。

我建议您不要事先读取文件，而是以文本模式打开文件，这样就不必解码或分割任何内容，因为数据将逐行传输并已经解码：

with open('EVG_text mining.txt', encoding='utf-8') as f: 
    for line in f: # lines come already decoded

VaderSentiment错误：TypeError：需要一个类似字节的对象，而不是'str'

1 个答案: