CSV数据集上的Logistic回归

时间:2018-12-30 21:30:35

标签: python scikit-learn logistic-regression

我尝试在我的数据集上实现逻辑回归

#,sugars_100g,energy_100g,is_over200
0,0.0,0.0,1
1,14.29,2243.0,1
2,17.86,1941.0,1
3,3.57,2540.0,1
4,0.0,1552.0,1
5,11.54,1933.0,1
6,0.0,1490.0,1
...
...
...

我尝试这样做:

df = pd.read_csv("Sugar_energy.csv")
x_train, x_test, y_train, y_test = train_test_split(df[['sugars_100g']], df.is_over200,test_size=0.1)
model = LogisticRegression()
model.fit(x_train, y_train)
wide_test = [i + 1 for i in range(2000)]
wide_test = np.array(wide_test).reshape(-1, 1)
is_higher_than_200 = model.predict(wide_test)
plt.scatter(wide_test, is_higher_than_200, marker ='+', color='red')
plt.show()

此代码可用于我拥有的其他数据集,但我需要在上面的一个数据集上执行此操作。不幸的是,在这里我得到这个错误:

    /Users/myname/PycharmProjects/FoodQuerks/venv/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
/Users/myname/PycharmProjects/FoodQuerks/venv/lib/python3.6/site-packages/sklearn/utils/validation.py:761: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Traceback (most recent call last):
  File "/Users/myname/PycharmProjects/FoodQuerks/Main/LogisticRegression.py", line 14, in <module>
    model.fit(x_train, y_train)
  File "/Users/myname/PycharmProjects/FoodQuerks/venv/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1305, in fit
    sample_weight=sample_weight)
  File "/Users/myname/PycharmProjects/FoodQuerks/venv/lib/python3.6/site-packages/sklearn/svm/base.py", line 881, in _fit_liblinear
    " class: %r" % classes_[0])
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1

我已经尝试过here的解决方案,所以我提出了:

x_train, y_train = shuffle(x_train, y_train)

在我分配数据进行训练之前,但没有帮助。

1 个答案:

答案 0 :(得分:-1)

您需要将此行model.fit(x_train, y_train)更改为model.fit(x_train, y_train.values.ravel())

在pandas数据帧中,.values.ravel()函数将列向量转换为1d数组,因此应该可以完成工作。