从旧数据帧创建一个具有 12 个月滞后数据的新数据帧

时间:2021-06-23 04:09:48

标签: python pandas

我有一个 df_train 如下:

             X1  
01-01-2020 | 1     
01-02-2020 | 2     
01-03-2020 | 3      
01-04-2020 | 4  
01-05-2020 | 5     
01-06-2020 | 6     
01-07-2020 | 7      
01-08-2020 | 8 

现在我想用日期时间索引构建另一个 df

我将获得日期时间索引:

future_dates = pd.date_range(df_train.index.max(), periods=12, freq='M')

我想获得一个新的 df,它将从 train_df 获得相同月份的数据。如果月份数据不存在,则使用ts_train的均值

预期结果:

               X1  
  01-09-2020 | 36/8     
  01-10-2020 | 36/8     
  01-11-2020 | 36/8      
  01-12-2020 | 36/8 
  01-01-2020 | 1     
  01-02-2020 | 2     
  01-03-2020 | 3      
  01-04-2020 | 4 
  01-05-2021 | 5     
  01-06-2021 | 6     
  01-07-2021 | 7      
  01-08-2021 | 8  

1 个答案:

答案 0 :(得分:1)

IIUC:

试试:

reindex()

最后使用new_df_train=df_train.reindex(future_dates) df_train['month']=df_train.index.month new_df_train['X1']=new_df_train.index.month.map(df_train.set_index('month')['X1']) new_df_train=new_df_train.fillna(new_df_train['X1'].mean())

new_df_train

X1 2020-08-01 8.0 2020-09-01 4.5 2020-10-01 4.5 2020-11-01 4.5 2020-12-01 4.5 2021-01-01 1.0 2021-02-01 2.0 2021-03-01 3.0 2021-04-01 4.0 2021-05-01 5.0 2021-06-01 6.0 2021-07-01 7.0 的输出:

data = {'workclass':
        ['State-gov', 'Self-emp-not-inc', 'Private']
       }
df = pd.DataFrame(data)

def agg_categorical_column(series):
    print(f'Input object type: {type(series)}')
    print(f'Input object looks like:\n {series}')
    return [','.join(set(series))]

aggregations = {}
aggregations['workclass'] = agg_categorical_column 

res = df[['workclass']].agg(aggregations['workclass']) # works
print('results is a series as expected.\n', res)

print('\n\n')
res = df[['workclass']].agg(aggregations) # not works, means agg acts like map each element
print('results is a dataframe but with strange value:\n', res)