我正在尝试拟合我的模型,但是我不断收到以下错误:
y = column_or_1d(y, warn=True)
Traceback (most recent call last):
File "/Users/amanpuranik/PycharmProjects/covid/fake news 2.py", line 107, in <module>
model.fit(x_train,y_test)
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/naive_bayes.py", line 609, in fit
X, y = self._check_X_y(X, y)
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/naive_bayes.py", line 475, in _check_X_y
return check_X_y(X, y, accept_sparse='csr')
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 765, in check_X_y
check_consistent_length(X, y)
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 212, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [3207, 802]
根据我的理解,这意味着当我拟合x和y时,它们是不同的形状。但是,当我打印出它们的形状时,它们都是相同的:
(4009, 1)
(4009, 1)
所以我不确定为什么会出现此错误。我该怎么做才能解决此问题?这是我的代码:
data = pd.read_csv("/Users/amanpuranik/Desktop/fake-news-detection/data.csv")
data = data[['Headline', "Label"]]
x = np.array(data['Headline'])
y = np.array(data["Label"])
#lowercase
lower = [[word.lower() for word in headline] for headline in stemmed2] #start here
#conver lower into a list of strings
lower_sentences = [" ".join(x) for x in lower]
print(lower_sentences)
#organising
articles = []
for headline in lower:
articles.append(headline)
#print(articles[0])
#creating the bag of words model
headline_bow = CountVectorizer()
headline_bow.fit(lower_sentences)
a = headline_bow.transform(lower_sentences)
print(a)
b = headline_bow.get_feature_names()
#testing and training part
yy = np.reshape(y,(-1,1))
lower2 = np.reshape(lower_sentences,(-1,1))
x_train, x_test, y_train, y_test = train_test_split(lower2, yy, test_size=0.2, random_state=1)
print(lower2.shape)
print(yy.shape)
#fitting on the model now
model = MultinomialNB() #don forget these brackets here
model.fit(x_train,y_test) #this is where the error comes in
答案 0 :(得分:1)
将ResBlock
更改为model.fit(x_train,y_test)
。我不知道这是否可以解决错误,但它是错误的。您无法同时容纳不匹配的训练数据和测试数据。
答案 1 :(得分:0)
我认为应该是 y = [column_or_1d(y, warn=True)]
而不是 y = column_or_1d(y, warn=True)