遍历Pandas数据框行的有效方法

时间:2019-05-07 10:51:49

标签: python pandas

我正在创建一个具有教育特征的人口模型。 我从人口的初始图片开始,该人口数量给出了每个年龄段(0到95)和每个教育水平(0-没有教育,到6-大学)的人数。

此图片被视为数据框的一列,并将每年作为新的预测迭代填充。 为了被填充,需要进行假设或诸如每个年龄组的死亡率,每个教育水平的入学率和成功率之类的事情。

我解决问题的方法是添加一个新列,并使用上一年的age-1值遍历各行,以计算新值(例如,年龄为5岁的男性人数为1岁时年龄为4岁的男性人数减去死亡人数)

此解决方案的问题在于,使用for循环和.loc遍历熊猫数据帧行效率很低,并且计算预测需要花费大量时间

       def add_year_temp(pop_table,time,
         old_year,new_year,
         enrollment_rate_primary,
         success_rate_primary,
         enrollment_rate_1st_cycle,
         success_rate_1st_cycle,
         enrollment_rate_2nd_cycle,
         success_rate_2nd_cycle,
         enrollment_rate_3rd_cycle,
         success_rate_3rd_cycle,
         enrollment_rate_university,
         success_rate_university,
         mortality_rate_0_1,
         mortality_rate_2_14,
         mortality_rate_15_64,
         mortality_rate_65,
         mortality_mf_ratio,
         enrollment_mf_ratio,
         success_mf_ratio):  

temp_table = pop_table
temp_table['year_ts'] = pd.to_datetime(temp_table[time])
temp_table['lag']= temp_table.groupby(['sex','schooling'])[old_year].shift(+1)
temp_table = temp_table.fillna(0)

for age in temp_table['age'].unique():
    for sex in temp_table['sex'].unique():

        mortality_mf_ratio_temp = 1
        enrollment_mf_ratio_temp = 1
        success_mf_ratio_temp = 1

        if sex == 'F':
            mortality_mf_ratio_temp = mortality_mf_ratio
            enrollment_mf_ratio_temp = enrollment_mf_ratio
            success_mf_ratio_temp = success_mf_ratio

        if   age <= 1:
            for schooling in [0]:

                    temp_table.loc[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling),'lag'] = \
                    float(temp_table[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling)]['lag']) \
                    * (1 - mortality_rate_0_1 * mortality_mf_ratio_temp)                         
        elif   1 < age <= 5:
            for schooling in [0]:

                    temp_table.loc[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling),'lag'] = \
                    float(temp_table[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling)]['lag']) \
                    * (1 - mortality_rate_2_14 * mortality_mf_ratio_temp) 

以后很多行中,您可以看到例如我如何定义完成高中毕业并进入大学的人...

        elif  15 < age <= 17:
            for schooling in [0 ,1 ,2 ,3 ,4]:
                temp_table.loc[(temp_table['age']==age) \
                               & (temp_table['sex']== sex) \
                               & (temp_table['schooling']== schooling),'lag'] = \
                float(temp_table[(temp_table['age']==age-1) \
                               & (temp_table['sex']== sex) \
                               & (temp_table['schooling']== schooling)][old_year]) \
                    * (1 - mortality_rate_15_64 * mortality_mf_ratio_temp)
        elif age == 18:
            for schooling in [0 ,1 ,2, 3, 4, 5]:
                if schooling == 0:
                    temp_table.loc[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling),'lag'] = \
                    float(temp_table[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling)]['lag']) \
                    * (1 - mortality_rate_15_64 * mortality_mf_ratio_temp) 
                elif schooling == 1:
                    temp_table.loc[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling),'lag'] = \
                    float(temp_table[(temp_table['age']==(age-1)) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling)][old_year]) \
                    * (1 - mortality_rate_15_64 * mortality_mf_ratio_temp) 
                elif schooling == 2:
                    temp_table.loc[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling),'lag'] = \
                    float(temp_table[(temp_table['age']==(age-1)) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling)][old_year]) \
                    * (1 - mortality_rate_15_64 * mortality_mf_ratio_temp)
                elif schooling == 3:
                    temp_table.loc[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling),'lag'] = \
                    float(temp_table[(temp_table['age']==(age-1)) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling)][old_year]) \
                    * (1 - mortality_rate_15_64 * mortality_mf_ratio_temp)
                elif schooling == 4:
                    temp_table.loc[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling),'lag'] = \
                    float(temp_table[(temp_table['age']==(age-1)) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling)][old_year]) \
                    * (1 - mortality_rate_15_64 *  mortality_mf_ratio_temp) \
                    * (1 - enrollment_rate_3rd_cycle * enrollment_mf_ratio_temp \
                    * success_rate_3rd_cycle * success_mf_ratio_temp) 
                elif schooling == 5:
                    temp_table.loc[(temp_table['age']==age) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling),'lag'] = \
                    float(temp_table[(temp_table['age']==(age-1)) \
                                   & (temp_table['sex']== sex) \
                                   & (temp_table['schooling']== schooling-1)][old_year]) \
                    * (1 - mortality_rate_15_64 * mortality_mf_ratio_temp) \
                    * (enrollment_rate_3rd_cycle * enrollment_mf_ratio_temp \
                    * success_rate_3rd_cycle * success_mf_ratio_temp) 

所有年龄段的人都继续使用

就像我说的那样,它确实有效,但这既不优雅也不快速...

1 个答案:

答案 0 :(得分:0)

没有看到可验证的输出-https://stackoverflow.com/help/mcve-您可以使用:

temp_table['mortality_mf_ratio'] = temp_table.apply(lambda row: some_function_per_row(row), axis=1)

或者您可以使用np.where https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

temp_table['mortality_mf_ratio'] = np.where(temp_table['sex'] == 'F', 1, 0)