Question

我有两个数据框-一个是由用户从powerapp编辑的。另一个直接来自一个驱动器。

columns头几乎是相同的，我需要比较两个数据框，并将任何新行添加到来自powerapps的数据框中。这是两个示例数据帧：

Powerapps数据框：

          Send/Collect            Hospital   Courier                      Kit                      Manufacturer  Status
0                Send     Nuffield Ipswich   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
1                 Send         BMI Rosshal   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  In Progress
2              Collect       Stepping Hill   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
3              Collect       York District  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
4  Royal Devon Exeter                  NaN       NaN  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
5              collect       Spire Bristol  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Complete
6                 Send         Bridlington  Courier        ToeMotion - MTP DF  Arthrosurface Hire Log 2018.xlsx  Not Started
7   Send Femoral Head    Hampshire Clinic        DHL             Human Tissue             Human Tissue Log.xlsx   Not Started

Onedrive数据帧：

          Send/Collect            Hospital   Courier                      Kit                      Manufacturer
0                Send     Nuffield Ipswich   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
1                 Send         BMI Rosshal   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
2              Collect       Stepping Hill   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
3              Collect       York District  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
4  Royal Devon Exeter                                 ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
5              collect       Spire Bristol  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
6  Royal Devon Exeter                                 ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
7                 Send         Bridlington  Courier        ToeMotion - MTP DF  Arthrosurface Hire Log 2018.xlsx
8   Send Femoral Head    Hampshire Clinic        DHL             Human Tissue             Human Tissue Log.xlsx

如您所见，powerapps数据帧具有不同的列（可以包含不同的值，不仅是“未启动”），而onedrive数据帧具有额外的行（需要进入powerapps df）。 / p>

还要注意，虽然在单驱动器数据帧中，空单元格是字符串“”，但在powerapps中，它是nan。

我需要将多余的行从onedrive合并到powerapps（将状态“未开始”添加到该行）。我认为我需要一种方法，该方法将基于第0,3和4列中的相似性进行合并，而忽略第1,2和5列。

Answer 1

我认为concat适合这里

#replacing all the spaces with nan in the onedrive dataframe
onedrive.replace('""', 'nan') #use np.nan accordingly
powerapp = pd.concat([onedrive, powerapp])

powerapp.Status.fillna('Not Started', inplace=True)

根据列的子集删除冗余数据。
注意：合并后重置索引

熊猫-仅查看某些列时可找到两个数据框之间的异常行

1 个答案: