Question

我正在使用 scikit learn 进行多线性回归练习

我有一个带有列/标签名称的数据集，我正在通过 onehotencoder 推送它以获取分类标签。

我可以得到系数，但我真正想做的是将系数映射回原始列名。

我正在尝试通过从列转换器获取功能名称来实现这一点。

print(ct.get_feature_names())
['encoder__x0_California', 'encoder__x0_Florida', 'encoder__x0_New York', 'x0', 'x1', 'x2']
# Multiple Linear Regression

正如你在上面看到的，我得到了带有 x0、x1、x2 的直通列

实际的 X 标签标题是 ["R&D Spend","Administration","Marketing Spend","State]"

状态是被onehotencoded的列

知道查看每个特征名称系数的最佳方法是什么吗？

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
print(X)
# Encoding categorical data
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
o = OneHotEncoder()
ct = ColumnTransformer(transformers=[('encoder',o, [3])], remainder='passthrough')
ft = ct.fit_transform(X)
X = np.array(ft)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Training the Multiple Linear Regression model on the Training set
from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)
df = pd.DataFrame({'Test': y_test, 'Prediction': y_pred}, columns=['Test', 'Prediction'])
print(df)
# Output coefficients to dataframe with labels
print(ct.get_feature_names())
df_coef = pd.DataFrame({'feature_names': ct.get_feature_names(dataset.columns),
                      'coef': np.squeeze(regressor.coef_)})

print(df_coef)

将标签传递给线性回归模型的系数

0 个答案: