每次迭代将新行添加到熊猫数据框

时间:2018-10-23 12:50:19

标签: python pandas

Adding a new row to a dataframe with correct mapping in pandas

与上述问题类似。

      carrier_plan_identifier           ...            hios_issuer_identifier
1                        AUSK           ...                           99806.0
2                        AUSM           ...                           99806.0
3                        AUSN           ...                           99806.0
4                        AUSS           ...                           99806.0
5                        AUST           ...                           99806.0

我需要选择多个列,比如说carrier_plan_identifierwellthie_issuer_identifierhios_issuer_identifier

在这3列中,我需要运行一个选择查询,例如,

select id from table_name where carrier_plan_identifier = 'something' and wellthie_issuer_identifier = 'something' and hios_issuer_identifier = 'something'

我需要将id列添加回现有数据框

目前,我正在做类似的事情,

for index, frame in df_with_servicearea.iterrows():

            if frame['service_area_id'] and frame['issuer_id']:
                # reading from medical plans table
                medical_plan_id = getmodeldata.get_medicalplans(sess, frame['issuer_id'], frame['hios_plan_identifier'], frame['plan_year'],
                                                                frame['group_or_individual_plan_type'])

                frame['medical_plan_id'] = medical_plan_id
                df_with_servicearea.append(frame)

执行此操作frame['medical_plan_id'] = medical_plan_id时,不会添加任何内容。但是,当我执行df_with_servicearea['medical_plan_id'] = medical_plan_id时,只会将循环的最后一个值添加到所有行。我不确定这是否是正确的方法。

更新-:

使用后,我得到4行,而不是应该在那里的2行。

df_with_servicearea = df_with_servicearea.append(frame)



 wellthie_issuer_identifier       ...       medical_plan_id
0                   UHC99806       ...                   NaN
1                   UHC99806       ...                   NaN
0                   UHC99806       ...              879519.0
1                   UHC99806       ...              879520.0

更新2-根据Mayank的答案实现- 嗨,Mayank,您是这样建议吗?

对于索引,df_with_servicearea.iterrows()中的帧:

    if frame['service_area_id'] and frame['issuer_id']:
        # reading from medical plans table
        df_new = getmodeldata.get_medicalplans(sess, frame['issuer_id'], frame['hios_plan_identifier'], frame['plan_year'],
                                               frame['group_or_individual_plan_type'])
        df_new.columns = ['medical_plan_id', 'issuer_id', 'hios_plan_identifier', 'plan_year',
                          'group_or_individual_plan_type']
        new_df = pd.merge(df_with_servicearea, df_new, on=['issuer_id', 'hios_plan_identifier', 'plan_year', 'group_or_individual_plan_type'], how='left')

print new_df

我的get_medicalplans函数在其中调用选择查询。

def get_medicalplans(self,sess, issuerid, hios_plan_identifier, plan_year, group_or_individual_plan_type):
    try:
        medical_plan = sess.query(MedicalPlan.id, MedicalPlan.issuer_id, MedicalPlan.hios_plan_identifier,
                                     MedicalPlan.plan_year, MedicalPlan.group_or_individual_plan_type).filter(MedicalPlan.issuer_id == issuerid,
                                     MedicalPlan.hios_plan_identifier == hios_plan_identifier,
                                     MedicalPlan.plan_year == plan_year,
                                     MedicalPlan.group_or_individual_plan_type == group_or_individual_plan_type)
        sess.commit()
        return pd.read_sql(medical_plan.statement, medical_plan.session.bind) 

2 个答案:

答案 0 :(得分:0)

最简单的解决方法是将最后一行更改为:

    df_with_servicearea = df_with_servicearea.append(frame)

但是,如果要添加新列,请使用:

df_with_servicearea['medical_plan_id'] = df_with_servicearea.apply(
    lambda row:
    getmodeldata.get_medicalplans(sess,
                                  row['issuer_id'],
                                  row['hios_plan_identifier'],
                                  row['plan_year'],
                                  row['group_or_individual_plan_type']
                                  )
    if row['service_area_id']
    and row['issuer_id']
    else np.nan)

答案 1 :(得分:0)

尝试一下:

考虑到您要基于以下3个列来更新原始df:

1。)调整要在数据库上触发的查询,以在carrier_plan_identifier, wellthie_issuer_identifier and hios_issuer_identifier子句中包含列:select

select id,carrier_plan_identifier, wellthie_issuer_identifier,hios_issuer_identifier from table_name where carrier_plan_identifier = 'something' and wellthie_issuer_identifier = 'something' and hios_issuer_identifier = 'something'

2。)为上述结果创建一个数据框。

df = pd.DataFrame(cur.fetchall())

3。)现在df上方有id列,其他3列。现在,将mergedf一起基于列:original_df

carrier_plan_identifier, wellthie_issuer_identifier and hios_issuer_identifier

original_df = pd.merge(original_df,df, on=['carrier_plan_identifier','wellthie_issuer_identifier','hios_issuer_identifier'],how='outer')

因此,您必须了解这里发生的情况。我正在将Changed left join to Outer join.与carrier_plan_identifier列,wellthie_issuer_identifier和hios_issuer_identifier列上的query dataframe(df)结合起来,并附加original df列,因为它不存在。 无论在哪里找到匹配项,来自df的id列的值都会复制到id中,如果不匹配,则original_df列将具有NaN。 您不必使用任何循环。只需尝试我的代码即可。

这将为所有匹配的行向id添加id列。对于找不到匹配项的行将有original_df

您可以将id as Nan替换为以下任何值:

Nan

让我知道这是否有帮助。