我有一个当前可以正常工作的代码块,但最终却一遍又一遍地使用它,我不知道如何对其进行功能化(或简化为可以对其进行功能化的程度)。
最简单的说,我的数据看起来像是函数之前的前三列,而输出列是函数需要创建的:
Years | Input | Growth Vector | Output
2015 | | NaN | 37.40
2016 | | 1.5375 | 57.50
2017 | 75.00| 1.3043 | 75.00
2018 | | 1.4213 | 106.60
2019 | | 1.4309 | 152.53
2020 | | 1.3418 | 204.67
2021 | | 1.3843 | 283.32
2022 | | 1.5978 | 452.71
几条皱纹:
这是我当前正在使用的代码的一个版本:
Step 1: aggregate the user-defined Growth Vector source column
according to user-defined dimensions and calculate the annual percent change
df_change= (df
.sort_values(by=['Dimension1','Dimension2,'Year'])
.loc[:,['Dimension1', 'Dimension2','Year','Value1']]
.groupby(['Dimension1', 'Dimension2','Year'])
.sum()
.assign(Growth_Vector = lambda df: df.apply(lambda x: x.pct_change() + 1))
.reset_index()
)
Step 2: expand the Growth Vector temporary data frame to include
all dimension tags from the main data frame (so as to make both data frames
have the same length). This will replicate the Growth Vector values across
dimensions not used in the Growth Vector calculation.
key_df = df[['Dimension1', 'Dimension2', 'Dimension3']].drop_duplicates().reset_index(drop=True)
min_year = df[['Years']].min()
max_year = df[['Years']].max()
years = pd.DataFrame(data={'dummy':1, 'Year':list(range(min_year,max_year+1))})
df_expander = key_df.assign(dummy=1).merge(years).drop('dummy', axis=1)
expanded_df = df_expander.merge(df, how='left', on=['Dimension1', 'Dimension2', 'Dimension3','Year'])
Step 3: impute the missing years in order so as to provide the recursive
values for multiple years in a row being imputed. Melt the columns together
without overwriting any previously calculated or original values.
df2 = (expanded_df.merge(df_change [['Dimension1', 'Dimension2','Year','Growth_Vector']],
how='left',
on=['Dimension1', 'Dimension2','Year'])
.assign(value_imp_2016= lambda df: (df['Value2'] / df['Growth_Vector']).shift(-1))
.assign(value_imp_2015 = lambda df: (df['value_imp_2016'] / df['Growth_Vector']).shift(-1))
.assign(value_imp_2018 = lambda df: df['Value2'].shift(1) * df['Growth_Vector'])
.assign(value_imp_2019 = lambda df: df['value_imp_2018'].shift(1) * df['Growth_Vector'])
.assign(value_imp_2020 = lambda df: df['value_imp_2019'].shift(1) * df['Growth_Vector'])
.assign(value_imp_2021 = lambda df: df['value_imp_2020'].shift(1) * df['Growth_Vector'])
.assign(value_imp_2022 = lambda df: df['value_imp_2021'].shift(1) * df['Growth_Vector'])
.assign(**{'Value2 Imputed': lambda df: df['Value2'].fillna(df['value_imp_2015'])})
.assign(**{'Value2 Imputed': lambda df: df['Value2 Imputed'].fillna(df['value_imp_2016'])})
.assign(**{'Value2 Imputed': lambda df: df['Value2 Imputed'].fillna(df['value_imp_2018'])})
.assign(**{'Value2 Imputed': lambda df: df['Value2 Imputed'].fillna(df['value_imp_2019'])})
.assign(**{'Value2 Imputed': lambda df: df['Value2 Imputed'].fillna(df['value_imp_2020'])})
.assign(**{'Value2 Imputed': lambda df: df['Value2 Imputed'].fillna(df['value_imp_2021'])})
.assign(**{'Value2 Imputed': lambda df: df['Value2 Imputed'].fillna(df['value_imp_2022'])})
.assign(**{'Value2': lambda df: df['Value Imputed2']})
.drop({'Growth_Vector',
'value_imp_2016',
'value_imp_2015',
'value_imp_2018',
'value_imp_2019',
'value_imp_2020',
'value_imp_2021',
'value_imp_2022',
'Value2 Imputed',
}, axis=1)
)
在此示例中,“ df”是包含增长向量源列“ Value1”,需要额外年份估算的“ Value2”和所有维列的大数据框。数据框“ df_change”具有与“ df”相同的长度,并且具有相同的尺寸列以及“增长向量”列。最后,“ df2”是输出帧,除包含输出列“ Value”外,它等于“ df”。
请帮助!作为可调用函数会是什么样子?有没有更简单的方法来实现自己的目标?