我有一个看起来像这样的数据框,称为数据框1
year month day feature_x feature_y
2020 5 1 3 2
2020 5 1 1 3
2020 5 1 2 1
然后我应用了pd.get_dummies()
,它看起来像Dataframe 2
year month day feature_x_1 feature_x_2 feature_x_3 feature_y_1 feature_y_2. feature_y_3
2020 5 1 0 0 1 0 1 0
2020 5 1 1 0 0 0 1 0
2020 5 1 0 1 0 0 1 0
现在我有了一个预测数据帧3,看起来像第一个数据帧,但是只有1行。看起来像这样
year month day feature_x feature_y
2020 2 10 1 3
所需的输出应类似于如下所示的数据框2
year month day feature_x_1 feature_x_2 feature_x_3 feature_y_1 feature_y_2. feature_y_3
2020 2 10 1 0 0 0 0 1
有人可以帮我吗?谢谢。
答案 0 :(得分:3)
您可以使用reindex
使结果数据框具有与第二个相同的列:
Dataframe4 = pd.get_dummies(Dataframe3, columns=['feature_x', 'feature_y']
).reindex(columns=Dataframe2.columns).fillna(0).astype('int')
答案 1 :(得分:2)
我建议以下内容:
# initialize the provided data frames
Dataframe1 = pd.DataFrame([[2020, 5, 1, 3, 2],
[2020, 5, 1, 1, 3],
[2020, 5, 1, 2, 1]] ,
columns = ['year', 'month', 'day', 'feature_x', 'feature_y'])
Dataframe2 = pd.get_dummies(Dataframe1, columns = ['feature_x', 'feature_y'])
Dataframe3 = pd.DataFrame([[2020, 2, 10, 1, 3]] ,
columns = ['year', 'month', 'day', 'feature_x', 'feature_y'])
# a dictionary of each feature for which dummies are desired
features_to_dummies = {'feature_x' : [], 'feature_y' : []}
# add the corresponding dummies as values to the dictionary
for feature in features_to_dummies.keys():
for column_name in Dataframe2.columns.values:
if feature in column_name:
features_to_dummies[feature].append(column_name)
# add the same dummy variables to Dataframe3, all initialized to 0
for feature in features_to_dummies.keys():
for dummy in features_to_dummies[feature]:
Dataframe3[dummy] = 0
# set the dummy variables to the proper value
for feature in features_to_dummies.keys():
Dataframe3[feature + '_' + str(Dataframe3.iloc[0][feature])] = 1
# drop the initial features
Dataframe3.drop(columns = features_to_dummies.keys(), inplace = True)
这将产生所需的输出:
year month day feature_x_1 feature_x_2 feature_x_3 feature_y_1 feature_y_2 feature_y_3
0 2020 2 10 1 0 0 0 0 1
请注意,应使用这种方法对应转换为虚拟对象的功能进行硬编码(以“ feature_name”形式添加到“ features_to_dummies”字典中:[])。
让我知道这是否有帮助。
答案 2 :(得分:1)
尝试一下。
import pandas as pd
Dataframe1 = pd.DataFrame([[2020, 5, 1, 3, 2],
[2020, 5, 1, 1, 3],
[2020, 5, 1, 2, 1]] ,
columns = ['year', 'month', 'day', 'feature_x', 'feature_y'])
Dataframe2 = pd.get_dummies(Dataframe1, columns = ['feature_x', 'feature_y'])
Dataframe3 = pd.DataFrame([[2020, 2, 10, 1, 3]] ,
columns = ['year', 'month', 'day', 'feature_x', 'feature_y'])
Dataframe4 = pd.get_dummies(Dataframe3, columns = ['feature_x', 'feature_y'])
misscols = list(set(Dataframe2.columns) - set(Dataframe4.columns))
for col in misscols:
Dataframe4[col] = 0
Dataframe4 = Dataframe4[Dataframe2.columns]