熊猫获得假人逆转预测

时间:2020-05-15 05:25:42

标签: python pandas encoding

我有一个看起来像这样的数据框,称为数据框1

year  month  day  feature_x feature_y
2020  5      1    3         2
2020  5      1    1         3
2020  5      1    2         1   

然后我应用了pd.get_dummies(),它看起来像Dataframe 2

year  month  day  feature_x_1  feature_x_2  feature_x_3  feature_y_1  feature_y_2.  feature_y_3
2020  5      1    0            0            1            0            1             0
2020  5      1    1            0            0            0            1             0
2020  5      1    0            1            0            0            1             0

现在我有了一个预测数据帧3,看起来像第一个数据帧,但是只有1行。看起来像这样

year  month  day  feature_x feature_y
2020  2      10   1         3

所需的输出应类似于如下所示的数据框2

year  month  day  feature_x_1  feature_x_2  feature_x_3  feature_y_1  feature_y_2.  feature_y_3
2020  2      10   1            0            0            0            0             1

有人可以帮我吗?谢谢。

3 个答案:

答案 0 :(得分:3)

您可以使用reindex使结果数据框具有与第二个相同的列:

Dataframe4 = pd.get_dummies(Dataframe3, columns=['feature_x', 'feature_y']
               ).reindex(columns=Dataframe2.columns).fillna(0).astype('int')

答案 1 :(得分:2)

我建议以下内容:

# initialize the provided data frames
Dataframe1 = pd.DataFrame([[2020, 5, 1, 3, 2],
                           [2020, 5, 1, 1, 3],
                           [2020, 5, 1, 2, 1]] ,
                          columns = ['year', 'month', 'day', 'feature_x', 'feature_y'])

Dataframe2 = pd.get_dummies(Dataframe1, columns = ['feature_x', 'feature_y'])

Dataframe3 = pd.DataFrame([[2020, 2, 10, 1, 3]] ,
                          columns = ['year', 'month', 'day', 'feature_x', 'feature_y'])

# a dictionary of each feature for which dummies are desired
features_to_dummies = {'feature_x' : [], 'feature_y' : []}

# add the corresponding dummies as values to the dictionary
for feature in features_to_dummies.keys():
    for column_name in Dataframe2.columns.values:
        if feature in column_name:
            features_to_dummies[feature].append(column_name)

# add the same dummy variables to Dataframe3, all initialized to 0
for feature in features_to_dummies.keys():
    for dummy in features_to_dummies[feature]:
        Dataframe3[dummy] = 0

# set the dummy variables to the proper value
for feature in features_to_dummies.keys():
    Dataframe3[feature + '_' + str(Dataframe3.iloc[0][feature])] = 1

# drop the initial features
Dataframe3.drop(columns = features_to_dummies.keys(), inplace = True)

这将产生所需的输出:

        year    month   day feature_x_1 feature_x_2 feature_x_3 feature_y_1 feature_y_2 feature_y_3
0       2020    2       10  1           0           0           0           0           1

请注意,应使用这种方法对应转换为虚拟对象的功能进行硬编码(以“ feature_name”形式添加到“ features_to_dummies”字典中:[])。

让我知道这是否有帮助。

答案 2 :(得分:1)

尝试一下。

import pandas as pd
Dataframe1 = pd.DataFrame([[2020, 5, 1, 3, 2],
                           [2020, 5, 1, 1, 3],
                           [2020, 5, 1, 2, 1]] ,
                          columns = ['year', 'month', 'day', 'feature_x', 'feature_y'])

Dataframe2 = pd.get_dummies(Dataframe1, columns = ['feature_x', 'feature_y'])

Dataframe3 = pd.DataFrame([[2020, 2, 10, 1, 3]] ,
                          columns = ['year', 'month', 'day', 'feature_x', 'feature_y'])


Dataframe4 = pd.get_dummies(Dataframe3, columns = ['feature_x', 'feature_y'])
misscols = list(set(Dataframe2.columns) - set(Dataframe4.columns))
for col in misscols:
    Dataframe4[col] = 0
Dataframe4 = Dataframe4[Dataframe2.columns]