Question

我想在exisitng数据框中添加一个新列。

我正在这样做

def test(self, sess, df):

    for index, frame in df.iterrows():
        medical_plan = sess.query(MedicalPlan.id).filter(MedicalPlan.issuer_id == frame['issuer_id'],
                                  MedicalPlan.hios_plan_identifier == frame['hios_plan_identifier'],
                                  MedicalPlan.plan_year == frame['plan_year'],
                                  MedicalPlan.group_or_individual_plan_type == frame['group_or_individual_plan_type']).first()
        sess.commit()
        frame['medical_plan_id'] = list(medical_plan)[0]
        df = df.append(frame)
    print df

循环前的df是，

  wellthie_issuer_identifier       ...       service_area_id
0                   UHC99806       ...                     1

[1 rows x 106 columns]

通常，列和数据应添加到此行。但是我却得到了2行，并且只插入了最后一个循环值。循环后df，正在创建列，但数据错误。

 wellthie_issuer_identifier       ...       medical_plan_id
0                   UHC99806       ...                   NaN
0                   UHC99806       ...              879519.0

[2 rows x 107 columns]

我如何实现这一目标。我应该得到的输出如下-

 wellthie_issuer_identifier       ...       service_area_id  medical_plan_id
0                   UHC99806       ...                     1    879519.0

[1 rows x 107 columns]

尝试1：

我像下面这样调用了get_id方法-

 def test(self, sess, df):
        print ("in test", df)
        for index, frame in df.iterrows():
            id = self.get_id(sess, frame)
            df['medical_plan_id'] = df.apply(id, axis=1)
        print df

Answer 1

def test(self, sess, df):
     def get_id(frame):
            medical_plan = sess.query(MedicalPlan.id).filter(MedicalPlan.issuer_id == frame['issuer_id'],
                                          MedicalPlan.hios_plan_identifier == frame['hios_plan_identifier'],
                                          MedicalPlan.plan_year == frame['plan_year'],
                                          MedicalPlan.group_or_individual_plan_type == frame['group_or_individual_plan_type']).first()
            sess.commit()
            return list(medical_plan)[0]
      df['medical_plan_id']=df.apply(get_id, axis =1)
      print(df)

如果您希望medical_plan_id是一个整数，则可以将get_id的最后一行更改为return int(list(medical_plan)[0])。另外，您可能可以做到

    medical_plan = sess.query(MedicalPlan.id).filter(
            all([MedicalPlan.attribute == frame.attribute for attribute in 
                 ['issuer_id','hios_plan_identifier','plan_year','group_or_individual_plan_type']])).first()

或

        attributes = ['issuer_id','hios_plan_identifier','plan_year','group_or_individual_plan_type']
        medical_plan = sess.query(MedicalPlan.id).filter(all(MedicalPlan[attributes]==frame[attributes])).first())

（我不确定是否会MedicalPlan是哪种对象才能确定是否可行。）

遍历数据框并添加新行

1 个答案: