已解决

Question

我正在尝试使用nba球员数据的DataFrame拟合一些线性回归模型，以预测目标变量“ ORPM”。但是，当以下代码运行时：

X = orpm_data.drop(['Player','Lg','ORPM'],axis=1)
y = orpm_data['ORPM']
linreg = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('linreg', LinearRegression())])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
random_state=99)
linreg.fit(X_train,y_train)

错误：

ValueError：“ ORPM”不在列表中

被提出。我想念什么？

编辑以回复评论：

print（X）打印整个无法显示的巨大数据框-但是X.info（）返回以下内容：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 603 entries, 0 to 602
Data columns (total 29 columns):
Tm        603 non-null object
Season    603 non-null object
Age       603 non-null int64
G         603 non-null int64
GS        603 non-null int64
MP        603 non-null int64
FG        603 non-null float64
FGA       603 non-null float64
2P        603 non-null float64
2PA       603 non-null float64
3P        603 non-null float64
3PA       603 non-null float64
FT        603 non-null float64
FTA       603 non-null float64
ORB       603 non-null float64
DRB       603 non-null float64
TRB       603 non-null float64
AST       603 non-null float64
STL       603 non-null float64
BLK       603 non-null float64
TOV       603 non-null float64
PF        603 non-null float64
PTS       603 non-null float64
FG%       600 non-null float64
2P%       600 non-null float64
3P%       517 non-null float64
eFG%      600 non-null float64
FT%       588 non-null float64
TS%       600 non-null float64
dtypes: float64(23), int64(4), object(2)
memory usage: 136.7+ KB

print（y）返回：

None
0      2.38
1      3.87
2     -1.21
3      1.58
4     -4.30
5     -0.62
       ... 
598   -2.64
599    0.95
600   -2.98
601   -0.98
602   -2.08
Name: ORPM, Length: 603, dtype: float64

EDIT2 ：有关预处理管道的更多详细信息

numeric_features = ['Age', 'G', 'GS', 'MP', 'FG', 'FGA',
       '2P', '2PA', '3P', '3PA', 'FT', 'FTA', 'ORB', 'DRB', 'TRB', 'AST',
       'STL', 'BLK', 'TOV', 'PF', 'PTS', 'FG%', '2P%', '3P%', 'eFG%', 'FT%',
       'TS%','ORPM']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(missing_values=np.nan, strategy='mean')),
    ('scaler', StandardScaler()),
    ('PCA', PCA())])

categorical_features = ['Tm', 'Season']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant',fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

此处有完整的错误堆栈跟踪：https://github.com/aj-1000/debugging-regression-model/blob/master/README.md

已解决

通过从numeric_features中删除“ ORPM”解决了问题-因为我在将数据传递到管道之前删除了该列。

拟合回归模型在sklearn中导致“ ValueError：'列名'不在列表中”

已解决

0 个答案: