在逻辑回归中训练模型时可以使用概率标签吗?

时间:2019-08-14 15:28:22

标签: python-3.x scikit-learn logistic-regression

我使用sklearn.linear_model.LogisticRegression,并希望在训练模型时使用概率标签。

但是作为下面的代码,当我尝试使用带有概率标签的训练数据来训练逻辑回归模型时出现错误。

有什么方法可以使用概率标签训练逻辑回归模型?

import numpy as np
from sklearn.linear_model import LogisticRegression

x = np.array([1966, 1967, 1968, 1969, 1970,
              1971, 1972, 1973, 1974, 1975,
              1976, 1977, 1978, 1979, 1980,
              1981, 1982, 1983, 1984]).reshape(-1, 1)

y = np.array([0.003, 0.016, 0.054, 0.139, 0.263,
              0.423, 0.611, 0.758, 0.859, 0.903,
              0.937, 0.954, 0.978, 0.978, 0.982,
              0.985, 0.989, 0.988, 0.992])

lr = LogisticRegression()
lr.fit(x, y) 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-6f0a54f18841> in <module>()
     13 
     14 lr = LogisticRegression()
---> 15 lr.fit(x, y)  # => ValueError: Unknown label type: 'continuous'

/home/sudot/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py in fit(self, X, y, sample_weight)
   1172         X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
   1173                          order="C")
-> 1174         check_classification_targets(y)
   1175         self.classes_ = np.unique(y)
   1176         n_samples, n_features = X.shape

/home/sudot/anaconda3/lib/python3.6/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    170     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    171             'multilabel-indicator', 'multilabel-sequences']:
--> 172         raise ValueError("Unknown label type: %r" % y_type)
    173 
    174 

ValueError: Unknown label type: 'continuous'

1 个答案:

答案 0 :(得分:0)

逻辑回归是二进制分类模型。您不能将非分类值作为目标传递。

仅在拟合前对y取整。

y = y.round(0)  # Add this line

lr = LogisticRegression()
lr.fit(x, y)