Sklearn: 'str' object has no attribute 'read'

时间:2016-10-20 18:37:09

标签: python scikit-learn countvectorizer

I want to use Sklearn to vectorize my data in a big csv file, I used the following code:

First TRY:

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(input='file', stop_words = 'english', ngram_range=(1,2))

vectorizer.fit_transform('test.csv')

But I got this error:

AttributeError: 'str' object has no attribute 'read'

Second TRY, but error was still raised:

import csv

file = open('test.csv', 'r')

f = file.readline()

vectorizer.fit_transform(f)

Third TRY: This one did work, but it was killed due to out of memory.

file = open('test.csv', 'r')
a = file.read()
vectorizer = TfidfVectorizer(stop_words = 'english', ngram_range=(1,2))
de = vectorizer.fit_transform(a.split('\n'))

How to use fit_transform in Sklearn to process a large CSV file?

1 个答案:

答案 0 :(得分:0)

您认为自己的输入为file,并且在两种情况下都给它stringfile.readline()将文件的第一行作为string返回。)< / p>

相反,给它一个文件。

请执行以下操作:

file = open('test.csv', 'r')
vectorizer.fit_transform(file)