Question

我是一个python noob，但试图矢量化一个没有运气的字符串。到目前为止，我从URL中的文章中提取数据，现在我试图对该文章进行分类，但到目前为止还没有工作。

（继续得到错误：引发AttributeError（attr +“不是发现“）AttributeError：未找到低位）

似乎没有任何帮助。

    url = input("Paste the webiste containing the article you want to analise here: ");
print "Analysing Webpage"
#Gets the URL from the extension
#Goose loaded
g = Goose()
#Extract the text and feed it to the classifier
article = g.extract(url=url)
article = article.cleaned_text
article = clean(article)
article =str(article)
print "Vectorising Text"
article = article.split();
vect = CountVectorizer(min_df=0., max_df=1.0)
X = vect.fit_transform(article)
X.toarray()
X = vect.transform(X).toarray()
print X
print "Predicting Political Bias"
loaded_model = pickle.load(open("text_clf_svm.pkl", 'rb'))
predicted_svm = loaded_model.predict(X)
print predicted_svm

非常欢迎任何形式的帮助或指示，并感谢=）

Answer 1

您似乎在文本中应用了 fit_transform 。这导致与您/某人训练分类器的X矩阵不同的X矩阵。您需要同时使用＆＃34;对齐＆＃34;。

在你的情况下，你有＆＃34;更低＆＃34;在你的X矩阵中，但模型已在没有这个词的矩阵上训练过。

在您的情况下，您要么使用 CountVectorizer 模型来训练模型，而您只需要应用转换，或者您应该使用 fit_transform < / em>但是在完整的语料库中训练模型并在以后的生产中使用它。

我希望它有所帮助，

此致尼古拉斯

Vectorise一个字符串

1 个答案: