我用scikit进行多项式回归学习并尝试解释系数。但不知何故,scikit并没有格式化输出。所以它看起来像这样:
[ 0.,0.95545289,0.,0.20682341,-0.,0.,-0.,-0.,0.,0.,0.,-0.,0.,-0.,-0.,]
如何将系数映射到创建的要素? 我到目前为止的代码:
poly = PolynomialFeatures(interaction_only=True)
X_ = poly.fit_transform(X_train_minmax)
X_test1 = poly.fit_transform(X_test_minmax)
lasso_model = linear_model.LassoCV(cv = 10, copy_X = True, normalize = False)
lasso_fit = lasso_model.fit(X_, y_train)
lasso_path = lasso_model.score(X_, y_train)
y_pred= lasso_model.predict(X_test1)
lasso_model.coef_
THX!
答案 0 :(得分:0)
根据PolynomialFeatures
中的the docs:
powers_[i, j]
是第i个输出中第j个输入的指数。
所以这样的事情应该有效:
columns = ['_'.join(['x{var}^{exp}'.format(var=var, exp=exp) for var, exp in enumerate(a[i, :])]) for i in range(a.shape[0])
zip(columns, lasso_model.coef_)
重要的一行是第一行。 :)
答案 1 :(得分:0)
让我们假设您正在运行二次多项式回归。所以,
poly = PolynomialFeature(degree =2) #generate a polynomial object
X_ = poly.fit_transform(input_data) #ndarray to be used for regression.
其中input_data = [X1,X2,X3,...] #actually ndarray represented as a List for simplicity
要查找列表Lasso.coef_
中的索引,其中(比如)存在X1因子,即X1,X1 ** 2,X1 * X2,X1 * X3,... X1 * Xn,请使用以下
list_of_index = []
for j in range(len(input_data)):#iterate over each input, X1, X2, etc
temp =[]
for i in X_.shape[1]:#iterate over the polynomial ndarray object columnwise
if poly.powers_[i,j] != 0:
temp.append(i)
list_of_index.append(temp)
list_of_index将是一个列表,其中包含具有X1,X2等因子的位置索引。
示例:
对于仅使用X1和X2的二度回归,
生成的ndarray将为[1 , X1, X2, X1**2, X1*X2, X2**2]
list_of_lists将是[[1,3,4],[2,4,5]]
您可以使用它来访问lasso_model.coef_