我有一个 df_train 如下:
X1
01-01-2020 | 1
01-02-2020 | 2
01-03-2020 | 3
01-04-2020 | 4
01-05-2020 | 5
01-06-2020 | 6
01-07-2020 | 7
01-08-2020 | 8
现在我想用日期时间索引构建另一个 df
我将获得日期时间索引:
future_dates = pd.date_range(df_train.index.max(), periods=12, freq='M')
我想获得一个新的 df,它将从 train_df 获得相同月份的数据。如果月份数据不存在,则使用ts_train的均值
预期结果:
X1
01-09-2020 | 36/8
01-10-2020 | 36/8
01-11-2020 | 36/8
01-12-2020 | 36/8
01-01-2020 | 1
01-02-2020 | 2
01-03-2020 | 3
01-04-2020 | 4
01-05-2021 | 5
01-06-2021 | 6
01-07-2021 | 7
01-08-2021 | 8
答案 0 :(得分:1)
IIUC:
试试:
reindex()
最后使用new_df_train=df_train.reindex(future_dates)
df_train['month']=df_train.index.month
new_df_train['X1']=new_df_train.index.month.map(df_train.set_index('month')['X1'])
new_df_train=new_df_train.fillna(new_df_train['X1'].mean())
:
new_df_train
X1
2020-08-01 8.0
2020-09-01 4.5
2020-10-01 4.5
2020-11-01 4.5
2020-12-01 4.5
2021-01-01 1.0
2021-02-01 2.0
2021-03-01 3.0
2021-04-01 4.0
2021-05-01 5.0
2021-06-01 6.0
2021-07-01 7.0
的输出:
data = {'workclass':
['State-gov', 'Self-emp-not-inc', 'Private']
}
df = pd.DataFrame(data)
def agg_categorical_column(series):
print(f'Input object type: {type(series)}')
print(f'Input object looks like:\n {series}')
return [','.join(set(series))]
aggregations = {}
aggregations['workclass'] = agg_categorical_column
res = df[['workclass']].agg(aggregations['workclass']) # works
print('results is a series as expected.\n', res)
print('\n\n')
res = df[['workclass']].agg(aggregations) # not works, means agg acts like map each element
print('results is a dataframe but with strange value:\n', res)