我有一个parent_df和一个child_df,如下所示。
parent_df:
x y colA
x1 y1 A1
x1 y2 A2
x2 y1 A3
x2 y2 A4
child_df:
p q colB colC
p1 q1 B1 C1
p1 q2 B2 C2
p2 q1 B3 C3
p2 q2 B4 C4
我想修改parent_df或通过将child_df放入parent_df(x2,y1)中特定行的parent_df来创建新的parent_df,以便:
parent_df:
x y p q colA colB colC
x1 y1 A1 NA NA
x1 y2 A2 NA NA
x2 y1 p1 q1 A3 B1 C1
p1 q2 A3 B2 C2
p2 q1 A3 B3 C3
p2 q2 A3 B4 C4
x2 y2 A4 NA NA
有办法做到这一点吗?
答案 0 :(得分:1)
我认为merge
需要sort_index
:
print (parent_df)
colA
x y
x1 y1 A1
y2 A2
x2 y1 A3
y2 A4
print (child_df)
colB colC
p q
p1 q1 B1 C1
q2 B2 C2
p2 q1 B3 C3
q2 B4 C4
#create new columns
child_df['x'] = 'x2'
child_df['y'] = 'y1'
#set index by new columns
child_df = child_df.reset_index().set_index(['x','y'])
print (child_df)
p q colB colC
x y
x2 y1 p1 q1 B1 C1
y1 p1 q2 B2 C2
y1 p2 q1 B3 C3
y1 p2 q2 B4 C4
df = pd.merge(parent_df, child_df, left_index=True, right_index=True, how='outer')
#replace NaN in p. q columns with '', append and sort index
df = df.fillna({'p':'','q':''}).set_index(['p','q'], append=True).sort_index()
print (df)
colA colB colC
x y p q
x1 y1 A1 NaN NaN
y2 A2 NaN NaN
x2 y1 p1 q1 A3 B1 C1
q2 A3 B2 C2
p2 q1 A3 B3 C3
q2 A3 B4 C4
y2 A4 NaN NaN