我想从文本文件中导入数据,并用单词进行向量空间表示:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(input="file")
f = open('D:\\test\\17.txt')
bag_of_words = vectorizer.fit(f)
bag_of_words = vectorizer.transform(f)
print(bag_of_words)
但是我收到了这个错误:
Traceback (most recent call last):
File "D:\test\test.py", line 5, in <module>
bag_of_words = vectorizer.fit(f)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 776, in fit
self.fit_transform(raw_documents)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 804, in fit_transform
self.fixed_vocabulary_)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 739, in _count_vocab
for feature in analyze(doc):
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 236, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 110, in decode
doc = doc.read()
AttributeError: 'str' object has no attribute 'read'
有什么想法吗?
答案 0 :(得分:0)
vectorizer.fit
方法需要一个可迭代的文件或字符串对象(不是单个文件对象),因此你应该有vectorizer.fit([f])
。
此外,您无法在第二次调用f
时重用vectorizer.transform
(因为此时已读取该文件)。您可能想要做的是以下内容:
vectorizer = CountVectorizer(input="file")
f = open('D:\\test\\17.txt')
bag_of_words = vectorizer.fit_transform([f])