我正在创建一个具有教育特征的人口模型。 我从人口的初始图片开始,该人口数量给出了每个年龄段(0到95)和每个教育水平(0-没有教育,到6-大学)的人数。
此图片被视为数据框的一列,并将每年作为新的预测迭代填充。 为了被填充,需要进行假设或诸如每个年龄组的死亡率,每个教育水平的入学率和成功率之类的事情。
我解决问题的方法是添加一个新列,并使用上一年的age-1值遍历各行,以计算新值(例如,年龄为5岁的男性人数为1岁时年龄为4岁的男性人数减去死亡人数)
此解决方案的问题在于,使用for循环和.loc遍历熊猫数据帧行效率很低,并且计算预测需要花费大量时间
def add_year_temp(pop_table,time,
old_year,new_year,
enrollment_rate_primary,
success_rate_primary,
enrollment_rate_1st_cycle,
success_rate_1st_cycle,
enrollment_rate_2nd_cycle,
success_rate_2nd_cycle,
enrollment_rate_3rd_cycle,
success_rate_3rd_cycle,
enrollment_rate_university,
success_rate_university,
mortality_rate_0_1,
mortality_rate_2_14,
mortality_rate_15_64,
mortality_rate_65,
mortality_mf_ratio,
enrollment_mf_ratio,
success_mf_ratio):
temp_table = pop_table
temp_table['year_ts'] = pd.to_datetime(temp_table[time])
temp_table['lag']= temp_table.groupby(['sex','schooling'])[old_year].shift(+1)
temp_table = temp_table.fillna(0)
for age in temp_table['age'].unique():
for sex in temp_table['sex'].unique():
mortality_mf_ratio_temp = 1
enrollment_mf_ratio_temp = 1
success_mf_ratio_temp = 1
if sex == 'F':
mortality_mf_ratio_temp = mortality_mf_ratio
enrollment_mf_ratio_temp = enrollment_mf_ratio
success_mf_ratio_temp = success_mf_ratio
if age <= 1:
for schooling in [0]:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling)]['lag']) \
* (1 - mortality_rate_0_1 * mortality_mf_ratio_temp)
elif 1 < age <= 5:
for schooling in [0]:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling)]['lag']) \
* (1 - mortality_rate_2_14 * mortality_mf_ratio_temp)
elif 15 < age <= 17:
for schooling in [0 ,1 ,2 ,3 ,4]:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==age-1) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling)][old_year]) \
* (1 - mortality_rate_15_64 * mortality_mf_ratio_temp)
elif age == 18:
for schooling in [0 ,1 ,2, 3, 4, 5]:
if schooling == 0:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling)]['lag']) \
* (1 - mortality_rate_15_64 * mortality_mf_ratio_temp)
elif schooling == 1:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==(age-1)) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling)][old_year]) \
* (1 - mortality_rate_15_64 * mortality_mf_ratio_temp)
elif schooling == 2:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==(age-1)) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling)][old_year]) \
* (1 - mortality_rate_15_64 * mortality_mf_ratio_temp)
elif schooling == 3:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==(age-1)) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling)][old_year]) \
* (1 - mortality_rate_15_64 * mortality_mf_ratio_temp)
elif schooling == 4:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==(age-1)) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling)][old_year]) \
* (1 - mortality_rate_15_64 * mortality_mf_ratio_temp) \
* (1 - enrollment_rate_3rd_cycle * enrollment_mf_ratio_temp \
* success_rate_3rd_cycle * success_mf_ratio_temp)
elif schooling == 5:
temp_table.loc[(temp_table['age']==age) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling),'lag'] = \
float(temp_table[(temp_table['age']==(age-1)) \
& (temp_table['sex']== sex) \
& (temp_table['schooling']== schooling-1)][old_year]) \
* (1 - mortality_rate_15_64 * mortality_mf_ratio_temp) \
* (enrollment_rate_3rd_cycle * enrollment_mf_ratio_temp \
* success_rate_3rd_cycle * success_mf_ratio_temp)
就像我说的那样,它确实有效,但这既不优雅也不快速...
答案 0 :(得分:0)
没有看到可验证的输出-https://stackoverflow.com/help/mcve-您可以使用:
temp_table['mortality_mf_ratio'] = temp_table.apply(lambda row: some_function_per_row(row), axis=1)
或者您可以使用np.where
https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
temp_table['mortality_mf_ratio'] = np.where(temp_table['sex'] == 'F', 1, 0)