Question

我有一个banking_dataframe，其中有21个不同的列，一个是目标列，其中10个是数字特征，其中10个是分类特征。我使用了get_dummies大熊猫方法将分类数据转换为一键编码。返回的数据帧有74列。现在，我想将编码后的数据帧与原始数据帧合并，因此我的最终数据应具有针对分类列的一键编码值，但应为数据帧的原始大小，即： 21列。

Link到Pandas的get_dummies函数：

调用代码段上的get_dummies代码段

encoded_features = pd.get_dummies(banking_dataframe[categorical_feature_names])

Answer 1

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

# creating a toy data frame to test
df = pd.DataFrame({'Gender': ['M', 'F', 'M', 'M', 'F', 'F', 'F']})

# instantiating and transforming the 'Gender' column of the df
one_hot = OneHotEncoder()
encoded = one_hot.fit_transform(df[['Gender']])

# one_hot object has an attribute 'categories_', which stores the array
# of categories sequentially, and those categories can serve as 
# new columns in our data frame.

df[one_hot.categories_[0]] = encoded.toarray()

Answer 2

您可以尝试以下方法：

pd.concat([df,encoded_features],axis=1)

如果您不希望增加尺寸，请尝试使用标签编码而不是pd.get_dummies（），因为pd.get_dummies（）会向数据集添加新列，而标签编码会在列本身中进行编码。
试试这个：

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['Categorical column_name'] = le.fit_transform(df['Categorical column_name'])

如何将返回的一键编码列合并到原始数据帧？

2 个答案: