熊猫-仅查看某些列时可找到两个数据框之间的异常行

时间:2018-08-31 14:34:42

标签: python pandas dataframe

我有两个数据框-一个是由用户从powerapp编辑的。另一个直接来自一个驱动器。

columns头几乎是相同的,我需要比较两个数据框,并将任何新行添加到来自powerapps的数据框中。这是两个示例数据帧:

Powerapps数据框:

          Send/Collect            Hospital   Courier                      Kit                      Manufacturer  Status
0                Send     Nuffield Ipswich   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
1                 Send         BMI Rosshal   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  In Progress
2              Collect       Stepping Hill   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
3              Collect       York District  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
4  Royal Devon Exeter                  NaN       NaN  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
5              collect       Spire Bristol  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Complete
6                 Send         Bridlington  Courier        ToeMotion - MTP DF  Arthrosurface Hire Log 2018.xlsx  Not Started
7   Send Femoral Head    Hampshire Clinic        DHL             Human Tissue             Human Tissue Log.xlsx   Not Started

Onedrive数据帧:

          Send/Collect            Hospital   Courier                      Kit                      Manufacturer
0                Send     Nuffield Ipswich   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
1                 Send         BMI Rosshal   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
2              Collect       Stepping Hill   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
3              Collect       York District  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
4  Royal Devon Exeter                                 ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
5              collect       Spire Bristol  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
6  Royal Devon Exeter                                 ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
7                 Send         Bridlington  Courier        ToeMotion - MTP DF  Arthrosurface Hire Log 2018.xlsx
8   Send Femoral Head    Hampshire Clinic        DHL             Human Tissue             Human Tissue Log.xlsx 

如您所见,powerapps数据帧具有不同的列(可以包含不同的值,不仅是“未启动”),而onedrive数据帧具有额外的行(需要进入powerapps df)。 / p>

还要注意,虽然在单驱动器数据帧中,空单元格是字符串“”,但在powerapps中,它是nan。

我需要将多余的行从onedrive合并到powerapps(将状态“未开始”添加到该行)。我认为我需要一种方法,该方法将基于第0,3和4列中的相似性进行合并,而忽略第1,2和5列。

1 个答案:

答案 0 :(得分:0)

我认为concat适合这里

#replacing all the spaces with nan in the onedrive dataframe
onedrive.replace('""', 'nan') #use np.nan accordingly
powerapp = pd.concat([onedrive, powerapp])

powerapp.Status.fillna('Not Started', inplace=True)

根据列的子集删除冗余数据。
注意:合并后重置索引