Question

我坚持在pandas执行for循环。

以下是代码段

for ix, pt in result.iterrows():
    for index,row in frame_SuggestedDose.iterrows():
        isTrue = False
        if (pt[0]==row[0] and pt[4]==row[10]):
            # print("found")
            pt[8] = row[2]
            isTrue = True
        if(isTrue or pt[4]>datetime.now().date()):
            break
        result.loc[ix] = pt

在上面的代码中，0,10,2,4是数据帧中列的索引。

如果结果患者ID与frame_SuggestedDose患者ID相同且日期相同，我想将value从frame_SuggestedDose复制到结果

结果框架负责人：

patientId   Date    IntervalDate    IntervalName    start_dt    Dose    FastingBloodGlucose IntervalSuggestedReason IntervalStatus  BGL SuggestedDose
006b5d  2017-09-08 20:30:00 2017-09-08 20:30:00 Int1    2017-09-08  NaN NaN suggested_dose_reason_new_care_plan NaN NaN 14.0

for frame_SuggestedDose frame

    patientId   category    value   units   effective   status  fasting hypo    suggestedDose   suggestedReason effective_dt    effective_tm    dailyDoseTime   dose_dt dose_tm
   006b5d51 DOSE_SUGGESTION 14.0    units   2017-09-08 20:30:00 active  0.0 0.0 0.0 suggested_dose_reason_new_care_plan 2017-09-08  20:30:00    1970-01-01 20:30:00 1970-01-01  20:30:00

执行大约需要2个小时。

如何减少执行时间？

我正在使用Jupyter Notebook

Answer 1

尝试以下代码

result = result.drop("value",axis=1)
result = pd.merge(result,
                  frame_SuggestedDose[["patientId","effective_dt","value"]],
                  left_on=["patientId","Date"],
                  right_on=["patientId","effective_dt"],
                  how="left")

减少pandas

1 个答案: