我希望将数据帧合并为在多个条件下通过sql获取的数据帧。
df1和df2如下所示:
DF1
Customer ID Cluster ID Customer Zone ID
CUS1001.A CUS1001.X CUS1000
CUS1001.B CUS1001.X CUS1000
CUS1001.C CUS1001.X CUS1000
CUS1001.D CUS1001.X CUS1000
CUS1001.E CUS1001.X CUS1000
CUS2001.A CUS2001.X CUS2000
DF2:
Complain ID RegistrationNumber Status
CUS3501.A 99231 open
CUS1001.B 21340 open
CUS1001.X 32100 open
我希望将这两个数据框合并为以下条件:
if(Complain ID == Customer ID):
Merge on Customer ID
Elif(Complain ID == Cluster ID):
Merge on Customer ID
Elif (Complain ID == Customer Zone ID):
Merge on Customer ID
Else:
Merge empty row.
最终结果应如下所示:
Customer ID Cluster ID Customer Zone ID Complain ID Regi ID Status
CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100 open
CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340 open
CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100 open
. . . . . .
. . . . . .
CUS2001.A CUS2001.X CUS2000 0 0 0
请帮忙!
答案 0 :(得分:1)
尝试使用pandas
:melt
,merge
和concat
df=pd.melt(df1)
df=df.merge(df2,left_on='value',right_on='Complain ID',how='left')
df['number']=df.groupby('variable').cumcount()
df=df.groupby('number').bfill()
Target=pd.concat([df1,df.iloc[:5,2:6]],axis=1).fillna(0).drop('number',axis=1)
Target
Out[39]:
Customer ID Cluster ID Customer Zone ID Complain ID RegistrationNumber \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100.0
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340.0
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100.0
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X 32100.0
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X 32100.0
5 CUS2001.A CUS2001.X CUS2000 0 0.0
Status
0 open
1 open
2 open
3 open
4 open
5 0
更新
通过使用numpy的intersect1d
,我个人最喜欢这种方法。
df1.MatchId=[np.intersect1d(x,df2.ComplainID.values) for x in df1[['CustomerID','ClusterID']].values]
df1.MatchId=df1.MatchId.apply(pd.Series)
df1
Out[307]:
CustomerID ClusterID CustomerZoneID MatchId
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN
df1.merge(df2,left_on='MatchId',right_on='ComplainID',how='left')
Out[311]:
CustomerID ClusterID CustomerZoneID MatchId ComplainID \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN NaN
RegistrationNumber Status
0 32100.0 open
1 21340.0 open
2 32100.0 open
3 32100.0 open
4 32100.0 open
5 NaN NaN