在Python中使用while函数将短语更改为向量

时间:2016-12-03 19:39:55

标签: python scikit-learn

我想将以下短语更改为带sklearn的矢量:

pd.read_csv()

我收到了以下代码:

Article 1. It is not good to eat pizza after midnight
Article 2. I wouldn't survive a day withouth stackexchange
Article 3. All of these are just random phrases
Article 4. To prove if my experiment works.
Article 5. The red dog jumps over the lazy fox

这给了我以下错误:

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(min_df=1)

n=0
while n < 5:
   n = n + 1
   a = ('Article %(number)s' % {'number': n})
   print(a)
   with open("LISR2.txt") as openfile:
     for line in openfile:
       if a in line:
           X=line
           print(vectorizer.fit_transform(X))

为什么会这样?我知道这应该有效,因为如果我单独输入:

ValueError: Iterable over raw text documents expected, string object received.

它给了我理想的载体。

X=("It is not good to eat pizza","I wouldn't survive a day", "All of these")

print(vectorizer.fit_transform(X))

1 个答案:

答案 0 :(得分:7)

当您提供原始数据时出现此问题,意味着直接将字符串提供给提取函数,而您可以给Y = [X]并将此Y作为参数传递然后您将得到它正确我也面临这个问题