Question

I want to use Sklearn to vectorize my data in a big csv file, I used the following code:

First TRY:

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(input='file', stop_words = 'english', ngram_range=(1,2))

vectorizer.fit_transform('test.csv')

But I got this error:

AttributeError: 'str' object has no attribute 'read'

Second TRY, but error was still raised:

import csv

file = open('test.csv', 'r')

f = file.readline()

vectorizer.fit_transform(f)

Third TRY: This one did work, but it was killed due to out of memory.

file = open('test.csv', 'r')
a = file.read()
vectorizer = TfidfVectorizer(stop_words = 'english', ngram_range=(1,2))
de = vectorizer.fit_transform(a.split('\n'))

How to use fit_transform in Sklearn to process a large CSV file?

Answer 1

您认为自己的输入为file，并且在两种情况下都给它string（file.readline()将文件的第一行作为string返回。）< / p>

相反，给它一个文件。

请执行以下操作：

file = open('test.csv', 'r')
vectorizer.fit_transform(file)

Sklearn: 'str' object has no attribute 'read'

1 个答案:

Sklearn: &#39;str&#39; object has no attribute &#39;read&#39;

1 个答案:

Sklearn: 'str' object has no attribute 'read'