我有一个csv文件,其中包含来自用户的输入。第一列是STATISTIC,这是我的Python代码中的函数,其后各列是每个统计信息的不同输入变量。
即WEIGHTED_MEAN统计信息需要VARIABLE_COLUMN和WEIGHT_VARIABLE。
我使用以下python代码读取了此csv文件,其中model_to_summarise是我需要准备摘要的df,而reprot_inputs是上面的csv:
def parse_report_input_table(model_to_summarise, report_inputs):
statistics_dict = {
"WEIGHTED_MEAN": Reporting.weighted_mean,
"MEAN": Reporting.get_mean_of_columns,
"SUM": Reporting.get_sum_of_columns,
"MAX": Reporting.get_max_of_columns,
"MIN": Reporting.get_min_of_columns,
"COUNT": Reporting.get_count_of_columns,
"PERIOD_END_BALANCES": Reporting.period_end_balances,
"PERIOD_START_BALANCES": Reporting.period_start_balances,
"AVERAGE_BALANCES": Reporting.average_balances,
"RATIO_V1": Reporting.ratio_calculation_v1,
"RATIO_V2": Reporting.ratio_calculation_v2
}
list_of_stat_reports = []
group_by_variables = report_inputs["GROUP_BY_VARIABLES"][0].split(" || ")
for index in report_inputs.index:
function_to_call = statistics_dict[report_inputs.loc[index, "STATISTIC"]]
if function_to_call == Reporting.weighted_mean:
weighted_mean_report = function_to_call(model_to_summarise, group_by_variables,
report_inputs.loc[index, "VARIABLE_COLUMN"],
report_inputs.loc[index, "WEIGHT_VARIABLE"])
list_of_stat_reports.append(weighted_mean_report)
elif function_to_call in [
Reporting.get_count_of_columns, Reporting.get_max_of_columns,
Reporting.get_mean_of_columns, Reporting.get_min_of_columns,
Reporting.get_sum_of_columns
]:
columns_to_stat = report_inputs.loc[index, "COLUMNS_TO_STAT"].split(" || ")
simple_stat_report = function_to_call(model_to_summarise,
group_by_variables,
columns_to_stat)
list_of_stat_reports.append(simple_stat_report)
elif function_to_call in [
Reporting.period_end_balances,
Reporting.period_start_balances,
Reporting.average_balances
]:
balances_df = function_to_call(model_to_summarise, group_by_variables,
report_inputs.loc[index, "UNMODIFIED_DATE_COLUMN"],
report_inputs.loc[index, "BALANCE_COLUMN"])
list_of_stat_reports.append(balances_df)
elif function_to_call == Reporting.ratio_calculation_v1:
ratio_df_v1 = function_to_call(model_to_summarise, group_by_variables,
report_inputs.loc[index, "NUMERATOR_VARIABLE"],
report_inputs.loc[index, "DENOMINATOR_VARIABLE"],
report_inputs.loc[index, "RATIO_NAME"])
list_of_stat_reports.append(ratio_df_v1)
elif function_to_call == Reporting.ratio_calculation_v2:
ratio_df_v2 = function_to_call(model_to_summarise, group_by_variables,
report_inputs.loc[index, "UNMODIFIED_DATE_COLUMN"],
report_inputs.loc[index, "NUMERATOR_VARIABLE"],
report_inputs.loc[index, "DENOMINATOR_VARIABLE"],
report_inputs.loc[index, "RATIO_NAME"])
list_of_stat_reports.append(ratio_df_v2)
else:
raise Exception("{missing_stat} is not available at the moment!"
.format(missing_stat=report_inputs.loc[index, "STATISTIC"]))
return list_of_stat_reports, group_by_variables
此语句的第一个返回是已创建的数据帧的列表(来自用户从csv文件请求的统计信息)。
在这种情况下,列表将填充weighted_mean_df,mean_df,period_end_balances_df和ratio_v2_df。
如您所见,每个函数都有不同的输入(有些输入相似,因此我将它们分组在if / else语句中)。
字典-statistics_dict目前还不是很大,并且为每个函数写if / elif都可以。
但是此statistics_dict将增加为30-40,并且写入,并且每个统计的if / elif并不是很好的编码。 有没有办法使这种方式更具通用性/动态性?
此刻,我为不同的统计信息编写了if / elif,因为它们具有不同的输入。
这是一个大问题,如果您需要更多说明,请告诉我!
答案 0 :(得分:0)
我是用这样的课程做到的:
Class ExampleClass:
def __init__(self, var1, var2, var3, all variables listed like that...):
self.var1 = var1
etc.
def func1(self):
func1 needs var1 and var3 so I use them by doing self.var1 and self.var3
def func2(self):
func2 needs var1 and var2 so I use them by self.var1 and self.var2
etc. for all the functions
Afterwards I modify the parse_report_input_table function like this:
def parse_report_input_table(model_to_summarise, report_inputs):
"""
Parse the csv table with inputs from the user.
"""
list_of_individual_stat_reports = []
group_by_variables = report_inputs["GROUP_BY_VARIABLES"][0].split(" || ")
for index in report_inputs.index:
reporting = Reporting(model_to_summarise,
group_by_variables,
report_inputs.loc[index, "NUMERATOR_VARIABLE"],
report_inputs.loc[index, "DENOMINATOR_VARIABLE"],
report_inputs.loc[index, "RATIO_NAME"],
report_inputs.loc[index, "VARIABLE_COLUMN"],
report_inputs.loc[index, "WEIGHT_VARIABLE"],
report_inputs.loc[index, "COLUMNS_TO_STAT"],
report_inputs.loc[index, "UNMODIFIED_DATE_COLUMN"],
report_inputs.loc[index, "BALANCE_COLUMN"])
statistics_dict = {
"WEIGHTED_MEAN": reporting.weighted_mean,
"MEAN": reporting.get_mean_of_columns,
"SUM": reporting.get_sum_of_columns,
"MAX": reporting.get_max_of_columns,
"MIN": reporting.get_min_of_columns,
"COUNT": reporting.get_count_of_columns,
"PERIOD_END_BALANCES": reporting.period_end_balances,
"PERIOD_START_BALANCES": reporting.period_start_balances,
"AVERAGE_BALANCES": reporting.average_balances,
"RATIO_V1": reporting.ratio_calculation_v1,
"RATIO_V2": reporting.ratio_calculation_v2
}
list_of_individual_stat_reports.append(statistics_dict[report_inputs.loc[index, "STATISTIC"]]())
return list_of_individual_stat_reports, group_by_variables
这样,当我调用类时,会创建所有参数,但实际上我要调用的函数仅接受所需的参数。
将接受改进,因为在此之前我对Python类的使用不多: