Question

我正在使用以下代码创建LogisticRegression分类器：

regressor = LogisticRegression()
regressor.fit(x_train, y_train)

x_train和y_train形状都是

<class 'tuple'>: (32383,)

x_train包含范围[0..1]附近的值，而y_train仅包含0和1 s。

不幸的是，fit失败并显示错误

ValueError: Found input variables with inconsistent numbers of samples: [1, 32383]

向参数添加转置无济于事。

Answer 1

我想有点重塑是必要的。我试过这样的话：

    from sklearn.linear_model import LogisticRegression
    import numpy as np 


    #x_train = np.random.randn(10,1)
    x_train = np.asarray(x_train).reshape(32383,1)
    con = np.ones_like(x_train)


    x_train = np.concatenate((con,x_train), axis =1)


    #y = np.random.randn(10,1)
    #y_train = np.where(y<0.5,1,0)
    y_train = np.asarray(y_train).reshape(32383,1)
    regressor = LogisticRegression()
    regressor.fit(x_train,y_train)

评论正是我创建一些数据所做的。并且不要忘记在示例中添加常量，据我所知，sklearn没有这样做。如果您对某些统计测试和结果的相关印刷感兴趣，Statsmodels也可能对您有所帮助：

    from statsmodels.api import Logit

    logit =Logit(y_train, x_train)

    fit= logit.fit()
    fit.summary()

这将为您提供更多统计信息，而不需要太多努力。

Answer 2

继续我在评论中提出的解决方案：问题是x_train的形状。所以我们需要重新塑造它：

来自文档：

X：{array-like，sparse matrix}，shape（n_samples，n_features）

y：类似数组，形状（n_samples，）

示例使用 scikit-learn 和 numpy ：

from sklearn.linear_model import LogisticRegression
import numpy as np

# create the tuple data
x_train = tuple(range(32383))
x_train = np.asarray(x_train)

#same for y_train
y_train=tuple(range(32383))
y_train = np.asarray(y_train)

#convert tuples to nparray and reshape the x_train
x_train = x_train.reshape(32383,1)

#check if shape if (32383,)
y_train.shape

#create the model
lg = LogisticRegression()

#Fit the model
lg.fit(x_train, y_train)

这应该可以正常工作。希望它有所帮助

在拟合LogisticRegression时，找到具有不一致样本数的输入变量

2 个答案: