Question

我想从文本文件中导入数据，并用单词进行向量空间表示：

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(input="file")
f = open('D:\\test\\17.txt')
bag_of_words = vectorizer.fit(f)
bag_of_words = vectorizer.transform(f)
print(bag_of_words)

但是我收到了这个错误：

Traceback (most recent call last):
  File "D:\test\test.py", line 5, in <module>
    bag_of_words = vectorizer.fit(f)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 776, in fit
self.fit_transform(raw_documents)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 804, in fit_transform
self.fixed_vocabulary_)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 739, in _count_vocab
for feature in analyze(doc):
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 236, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 110, in decode
doc = doc.read()
AttributeError: 'str' object has no attribute 'read'

有什么想法吗？

Answer 1

vectorizer.fit方法需要一个可迭代的文件或字符串对象（不是单个文件对象），因此你应该有vectorizer.fit([f])。

此外，您无法在第二次调用f时重用vectorizer.transform（因为此时已读取该文件）。您可能想要做的是以下内容：

vectorizer = CountVectorizer(input="file")
f = open('D:\\test\\17.txt')
bag_of_words = vectorizer.fit_transform([f])

从文本文件在python中创建向量时出错

1 个答案: