SVM ValueError:输入包含NaN,无穷大或对于dtype('float64')而言太大的值

时间:2020-02-10 08:06:30

标签: python pandas scikit-learn svm

为此请帮我!我不知道为什么在尝试输入一些文本来检测分类时会发生此错误。

这是我用于训练数据的代码。 如何解决?

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)

from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(X_train_tfidf,y_train)

if request.method == 'POST':
    message = request.form['message']
    data = [message]
    vect = vectorizer.transform(data).toarray()
    my_prediction = clf.predict(vect)

return render_template('result.html',prediction = my_prediction)`

1 个答案:

答案 0 :(得分:0)

  1. 使用your_data.isnull().any()检查数据中是否包含空值。 如果为空,请使用your_data = your_data.dropna()

  2. 使用np.isfinite(your_data)检查您的数据是否包含inf。如果有inf值,则可以使用your_data.replace([np.inf, -np.inf], np.nan),然后使用your_data = your_data.dropna()删除它们。

    your_data更改为您正在使用的数据帧的任何名称,例如XyX_train_tfidf

也请选中this answer,然后在帖子评论中将其标记为可能重复的内容。


编辑:按需添加样本。在X和y上这样做是最明显的事情。

from sklearn.model_selection import train_test_split
# Add these lines
X = X.replace([np.inf, -np.inf], np.nan)
y = y.replace([np.inf, -np.inf], np.nan)
X = X.dropna()
y = y.dropna()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)

from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(X_train_tfidf,y_train)

if request.method == 'POST':
    message = request.form['message']
    data = [message]
    vect = vectorizer.transform(data).toarray()
    my_prediction = clf.predict(vect)

return render_template('result.html',prediction = my_prediction)