我使用sklearn.linear_model.LogisticRegression
,并希望在训练模型时使用概率标签。
但是作为下面的代码,当我尝试使用带有概率标签的训练数据来训练逻辑回归模型时出现错误。
有什么方法可以使用概率标签训练逻辑回归模型?
import numpy as np
from sklearn.linear_model import LogisticRegression
x = np.array([1966, 1967, 1968, 1969, 1970,
1971, 1972, 1973, 1974, 1975,
1976, 1977, 1978, 1979, 1980,
1981, 1982, 1983, 1984]).reshape(-1, 1)
y = np.array([0.003, 0.016, 0.054, 0.139, 0.263,
0.423, 0.611, 0.758, 0.859, 0.903,
0.937, 0.954, 0.978, 0.978, 0.982,
0.985, 0.989, 0.988, 0.992])
lr = LogisticRegression()
lr.fit(x, y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-6f0a54f18841> in <module>()
13
14 lr = LogisticRegression()
---> 15 lr.fit(x, y) # => ValueError: Unknown label type: 'continuous'
/home/sudot/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py in fit(self, X, y, sample_weight)
1172 X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
1173 order="C")
-> 1174 check_classification_targets(y)
1175 self.classes_ = np.unique(y)
1176 n_samples, n_features = X.shape
/home/sudot/anaconda3/lib/python3.6/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
170 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
171 'multilabel-indicator', 'multilabel-sequences']:
--> 172 raise ValueError("Unknown label type: %r" % y_type)
173
174
ValueError: Unknown label type: 'continuous'
答案 0 :(得分:0)
逻辑回归是二进制分类模型。您不能将非分类值作为目标传递。
仅在拟合前对y取整。
y = y.round(0) # Add this line
lr = LogisticRegression()
lr.fit(x, y)