这是我遇到的错误:
<块引用>ConvergenceWarning: Liblinear 收敛失败,增加数量 的迭代。 warnings.warn("Liblinear 收敛失败,增加"
我一直在处理 nltk.corpus 的 brown 数据集中的浪漫和新闻类别,到目前为止还没有遇到任何问题。这是我要输入的代码:
import nltk
from nltk.corpus import brown
from nltk import pos_tag_sents
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn
for cat in brown.categories():
print(cat)
news_sent = brown.sents(categories=["news"])
romance_sent = brown.sents(categories=["romance"])
ndf = pd.DataFrame({'label':'news', 'sentence':news_sent})
rdf = pd.DataFrame({'label':'romance', 'sentence':romance_sent})
df = pd.concat([ndf, rdf])
df.head()
df['label'].value_counts()
fig, ax = plt.subplots()
_ = df['label'].value_counts().plot.bar(ax=ax, rot=0)
fig.savefig("categories_counts.png", bbox_inches = 'tight', pad_inches = 0)
pos_all = pos_tag_sents(df['sentence'])
def countPOS(pos_tag_sent, POS):
pos_count = 0
all_pos_counts = []
for sentence in pos_tag_sent:
for word in sentence:
tag = word[1]
if tag [:2] == POS:
pos_count = pos_count+1
all_pos_counts.append(pos_count)
pos_count = 0
return all_pos_counts
df['NN'] = countPOS(pos_all, 'NN')
df['JJ'] = countPOS(pos_all, 'JJ')
df.groupby('label').sum()
df.to_csv("df_news_romance.csv", index=False)
df = pd.read_csv("df_news_romance.csv")
fv = df[["NN", "JJ"]]
df['label'].value_counts()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(fv, df['label'],
stratify=df['label'],
test_size=0.25,
random_state = 42)
print(X_train.shape)
print(X_test.shape)
from sklearn.svm import LinearSVC
classifier = LinearSVC()
classifier.fit(X_train, y_train)
此时,我收到上述错误。为了从原始帖子中添加更多信息,我尝试了增加 max_iter 和添加 LinearSVC(dual=False) 之类的方法,但没有任何改进。任何帮助将不胜感激!
答案 0 :(得分:0)
您可能需要设置 LinearSVC(dual=False)
以防数据中的样本数多于特征数。 LinearSVC
的原始配置将 dual 设置为 True
,因为它用于解决对偶问题。您也可以尝试增加最大迭代次数(例如 max_iter=10000
)。