我有两个数据框-一个是由用户从powerapp编辑的。另一个直接来自一个驱动器。
columns头几乎是相同的,我需要比较两个数据框,并将任何新行添加到来自powerapps的数据框中。这是两个示例数据帧:
Powerapps数据框:
Send/Collect Hospital Courier Kit Manufacturer Status
0 Send Nuffield Ipswich Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx Not Started
1 Send BMI Rosshal Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx In Progress
2 Collect Stepping Hill Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx Not Started
3 Collect York District Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx Not Started
4 Royal Devon Exeter NaN NaN ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx Not Started
5 collect Spire Bristol Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx Complete
6 Send Bridlington Courier ToeMotion - MTP DF Arthrosurface Hire Log 2018.xlsx Not Started
7 Send Femoral Head Hampshire Clinic DHL Human Tissue Human Tissue Log.xlsx Not Started
Onedrive数据帧:
Send/Collect Hospital Courier Kit Manufacturer
0 Send Nuffield Ipswich Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx
1 Send BMI Rosshal Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx
2 Collect Stepping Hill Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx
3 Collect York District Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx
4 Royal Devon Exeter ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx
5 collect Spire Bristol Courier ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx
6 Royal Devon Exeter ActivMotion (HTO - DFO) NewClip Hire Log 2018.xlsx
7 Send Bridlington Courier ToeMotion - MTP DF Arthrosurface Hire Log 2018.xlsx
8 Send Femoral Head Hampshire Clinic DHL Human Tissue Human Tissue Log.xlsx
如您所见,powerapps数据帧具有不同的列(可以包含不同的值,不仅是“未启动”),而onedrive数据帧具有额外的行(需要进入powerapps df)。 / p>
还要注意,虽然在单驱动器数据帧中,空单元格是字符串“”,但在powerapps中,它是nan。
我需要将多余的行从onedrive合并到powerapps(将状态“未开始”添加到该行)。我认为我需要一种方法,该方法将基于第0,3和4列中的相似性进行合并,而忽略第1,2和5列。
答案 0 :(得分:0)
我认为concat适合这里
#replacing all the spaces with nan in the onedrive dataframe
onedrive.replace('""', 'nan') #use np.nan accordingly
powerapp = pd.concat([onedrive, powerapp])
powerapp.Status.fillna('Not Started', inplace=True)
根据列的子集删除冗余数据。
注意:合并后重置索引