我正在使用来自 sklearn.datasets
的波士顿住房数据集,并对我的数据运行岭回归和套索回归(后训练/测试拆分)。我现在正在尝试执行 k 折交叉验证以找到最佳惩罚参数,并编写了下面的代码。我该怎么做才能解决这个问题,并使用 Ridge 和 Lasso 回归的 K 折验证找到最佳惩罚参数?谢谢。
from sklearn.model_selection import RepeatedKFold
from numpy import arange
cva = RepeatedKFold(n_splits=10,n_repeats = 3, random_state=42)
kmodel = LassoCV(alphas=arange(0,1,.01), cv=cva,n_jobs=-1)
model.fit(x_train,y_train)
print(model.alpha_)
然后产生以下错误消息:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
答案 0 :(得分:0)
我在波士顿设置的数据集如下:
from sklearn.model_selection import RepeatedKFold
import numpy as np
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
如果我做 cv lasso,我不会得到你看到的错误,尽量不要用 alpha = 0 运行,这不是套索:
cva = RepeatedKFold(n_splits=10,n_repeats = 3, random_state=42)
model = LassoCV(alphas=np.arange(0.001,1,.01), cv=cva,n_jobs=-1)
model.fit(x_train,y_train)
那么:
print(model.alpha_)
0.001