我试图从以下数据中绘制逻辑回归图
X = np.array([0,1,2,3,4,5,6,7,8,9,10,11])
y = np.array([0,0,0,0,1,0,1,0,1,1,1,1])
然而,当我尝试:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
X = np.array([0,1,2,3,4,5,6,7,8,9,10,11])
y = np.array([0,0,0,0,1,0,1,0,1,1,1,1])
clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X, y)
我收到以下错误:
ValueError: Found input variables with inconsistent numbers of samples: [1, 12]
我有点困惑为什么它认为X或y只有一个样本。
答案 0 :(得分:2)
sklearn的现代版本期望2D数组为X
,因此请尝试按照错误消息中的建议重新整形:
In [7]: clf.fit(X.reshape(-1,1), y)
Out[7]:
LogisticRegression(C=100000.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
顺便说一句,sklearn 0.19.1
给了我一个明确的错误信息:
In [10]: sklearn.__version__
Out[10]: '0.19.1'
In [11]: clf.fit(X, y)
...
skipped
...
ValueError: Expected 2D array, got 1D array instead:
array=[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
更新:完整代码:
In [41]: %paste
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
import sklearn
X = np.array([0,1,2,3,4,5,6,7,8,9,10,11])
y = np.array([0,0,0,0,1,0,1,0,1,1,1,1])
print('SkLearn version: {}'.format(sklearn.__version__))
clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X.reshape(-1,1), y)
## -- End pasted text --
SkLearn version: 0.19.1
Out[41]:
LogisticRegression(C=100000.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0, warm_start=False)