Question

我正在尝试使用逻辑回归拟合数据，但出现值错误。

我正在使用来自 sklearn 的 iris 数据集：

# The data is in iris["data"] and target in iris["target"]
# For this section, we will work with a single feature 'petal width'
# which is the last (fourth) feature in iris["data"]
# We will assign class y=1 if the target's value is 2 and 0 otherwise

from sklearn.datasets import load_iris
import numpy as np

iris = load_iris()

# petal width
X = np.array([len(iris["data"]),1]).reshape(-1,1)
# 1 if Iris virginica, else 0
y = []
for x in iris["target"]:
    if x == 2.0:
        y.append(1)
    else:
        y.append(0)
y = np.array(y)

# Import the LogisticRegression class from scikit learn
from sklearn.linear_model import LogisticRegression

# Initialize the LogisticRegression class, use lbfgs solver and random state of 42
log_reg = LogisticRegression(solver='lbfgs', random_state=42)

# Fit the data
log_reg.fit(X, y)

这是我到达的地方

ValueError: Found input variables with inconsistent numbers of samples: [2, 150]

不确定是我的 x 还是 y 设置不正确？

Answer 1

原因是您在此处尝试错误地重塑了 X：

X = np.array([len(iris["data"]),1]).reshape(-1,1)

结果是

X.shape
# (2,1)

因此样本数量不一致，因为

y.shape
# (150,)

这种重塑是错误的；因为，从代码中的注释看来，您只需要第 4 个特征（花瓣宽度），您应该将其更改为：

X = iris['data'][:,3].reshape(-1,1)

确实给出了正确的形状：

X.shape
# (150, 1)

并且您的模型将毫无问题地安装（经过测试）。

我如何在逻辑回归中进行拟合？

1 个答案: