Question

我有以下工作代码：

imputer = Imputer(missing_values = 'NaN', strategy='mean', axis = 0)
imputer = imputer.fit(X_train[['Age']])
X_train['Age'] = imputer.transform(X_train[['Age']])

这会引发以下警告：

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

当我使用以下行时，我仍然收到相同的警告；为什么呢？：

X_train['Age'] = imputer.transform(X_train[['Age']])

如果我尝试对所有内容应用相同的逻辑：

imputer = Imputer(missing_values = 'NaN', strategy='mean', axis = 0)
imputer = imputer.fit(X_train.loc[:,'Age'])
X_train.loc[:,'Age'] = imputer.transform(X_train.loc[:, 'Age'])

我收到以下消息，但无法正常运行：

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

请，有人可以向我解释使用标签将列传递给计算机的正确方法是什么？

我不清楚使用[['Age']]和.loc [：，'Age']之间的区别，看起来它们具有相同的数据，但形状不同。

Answer 1

根据收到的最后一个错误，当您选择imputer = imputer.fit(X_train.loc[:, 'Age'])之类的数据框列时，实际上是将 Serie 传递给了 Imputer ，它是一维的。

type(X_train['Age'])
pandas.core.series.Series

但是，方法fit()希望您将二维数组传递给它。相反，您可以以返回Dataframe（即二维）的方式使用Age列索引：

type(X_train.iloc[:,2:3])
pandas.core.frame.DataFrame

这样，您将不会收到尺寸错误。我已针对您的目的对此进行了测试，并且有效。

Answer 2

使用fit_transform方法

imputer = Imputer(missing_values=0, strategy="mean", axis=0)
X_train[['Age']] = imputer.fit_transform(X_train[['Age']])

Answer 3

我推荐以下方法

X.loc[:,'Age'] = imputer.fit_transform(X[['Age']])

工作示例：

import pandas as  pd
import numpy as np
from sklearn.impute import SimpleImputer

X = pd.DataFrame({'Age': [12,13,'NaN', 23,31,12,43,32,42,]})
imputer = SimpleImputer(strategy='mean')
X.loc[:,'Age'] = imputer.fit_transform(X[['Age']])

#
    Age
0   12.0
1   13.0
2   26.0
3   23.0
4   31.0
5   12.0
6   43.0
7   32.0
8   42.0

imputer应该是2D {数组状，稀疏矩阵}，形状（n_samples，n_features）或DataFrame。当您仅使用X['Age']时，它将返回一个pd.Series对象。相反，当您使用X[['Age']]时，将返回数据框。

正确的使用标签将色谱柱传递给计算机的方法吗？

3 个答案: