Question

我正在尝试遍历Dataframe并有条件地分解数据。我有一个包含有关房价信息的数据框，而不是用字符串表示的数据，我希望它们是类别并用数字表示（即豪宅= 0，房屋= 1）。但是，有些列已经是整数或浮点数，因此我只想对字符串列进行分类。

我正在尝试分解数据，以便可以将其与keras顺序神经网络一起使用，而无需手动浏览每一列并分解我自己。

columns = list(dataframe)
for i in columns:
    if type(i)==str:
        xtrain.i = pd.Categorical(pd.factorize(dataframe.i)[0])

我认为这会分解数据，但出现错误

AttributeError: 'DataFrame' object has no attribute 'i' 而且pandas无法识别我正在尝试引用列选择。作为参考，下面的代码在代码中起作用。（MSZoning是列出的列）

xtrain.MSZoning = pd.Categorical(pd.factorize(xtrain.MSZoning)[0])

任何帮助或建议将不胜感激！

Answer 1

这更像是

for i in columns:
    if dataframe[i].dtypes=='object':
        xtrain[i] = pd.Categorical(pd.factorize(dataframe[i])[0])

由于您正在执行MlP，所以让我们使用LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

for i in columns:
    if dataframe[i].dtypes=='object':
        dataframe[i] = le.fit_transform(dataframe[i])

如何遍历pandas数据框列并基于条件分解？

1 个答案: