收敛警告线性 SVC — 增加迭代次数?

时间:2021-07-24 18:48:15

标签: python scikit-learn

这是我遇到的错误:

<块引用>

ConvergenceWarning: Liblinear 收敛失败,增加数量 的迭代。 warnings.warn("Liblinear 收敛失败,增加"

我一直在处理 nltk.corpus 的 brown 数据集中的浪漫和新闻类别,到目前为止还没有遇到任何问题。这是我要输入的代码:

import nltk
from nltk.corpus import brown
from nltk import pos_tag_sents
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn

for cat in brown.categories():
    print(cat)

news_sent = brown.sents(categories=["news"])
romance_sent = brown.sents(categories=["romance"])

ndf = pd.DataFrame({'label':'news', 'sentence':news_sent})
rdf = pd.DataFrame({'label':'romance', 'sentence':romance_sent})

df = pd.concat([ndf, rdf])

df.head()

df['label'].value_counts()

fig, ax = plt.subplots()
_ = df['label'].value_counts().plot.bar(ax=ax, rot=0)
fig.savefig("categories_counts.png", bbox_inches = 'tight', pad_inches = 0)

pos_all = pos_tag_sents(df['sentence'])

def countPOS(pos_tag_sent, POS):
    pos_count = 0
    all_pos_counts = []
    for sentence in pos_tag_sent:
        for word in sentence:
            tag = word[1]
            if tag [:2] == POS:
                pos_count = pos_count+1
        all_pos_counts.append(pos_count)
        pos_count = 0
    return all_pos_counts

df['NN'] = countPOS(pos_all, 'NN')
df['JJ'] = countPOS(pos_all, 'JJ')

df.groupby('label').sum()

df.to_csv("df_news_romance.csv", index=False)

df = pd.read_csv("df_news_romance.csv")

fv = df[["NN", "JJ"]]

df['label'].value_counts()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(fv, df['label'],
                                                stratify=df['label'],
                                                test_size=0.25,
                                                   random_state = 42)

print(X_train.shape)
print(X_test.shape)

from sklearn.svm import LinearSVC
classifier = LinearSVC()

classifier.fit(X_train, y_train)

此时,我收到上述错误。为了从原始帖子中添加更多信息,我尝试了增加 max_iter 和添加 LinearSVC(dual=False) 之类的方法,但没有任何改进。任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:0)

您可能需要设置 LinearSVC(dual=False) 以防数据中的样本数多于特征数。 LinearSVC 的原始配置将 dual 设置为 True,因为它用于解决对偶问题。您也可以尝试增加最大迭代次数(例如 max_iter=10000)。