我正在尝试连接2个数据框df1
和df2
df1
是一个多索引数据框,df2
的行数少于df1
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df1 = pd.DataFrame(np.random.randn(8), index=index)
df1
Out[15]:
0
first second
bar one -0.185560
two -2.358254
baz one 1.130550
two 1.441708
foo one -1.163076
two 1.776814
qux one -0.811836
two 0.389500
df2 = pd.DataFrame(data=[0,1,0,1],index=['bar','baz','foo', 'qux'],columns=['label'])
df2
Out[18]:
label
bar 0
baz 1
foo 0
qux 1
期望的结果将是:
df3
Out[18]:
0 label
first second
bar one -0.185560 0
two -2.358254 0
baz one 1.130550 1
two 1.441708 1
foo one -1.163076 0
two 1.776814 0
qux one -0.811836 1
two 0.389500 1
答案 0 :(得分:2)
In [132]: df1['label'] = df1.index.get_level_values(0).to_series().map(df2['label']).values
In [133]: df1
Out[133]:
0 label
first second
bar one 0.143211 0
two 1.133454 0
baz one 1.298973 1
two -0.717844 1
foo one -0.663768 0
two 0.687015 0
qux one 0.412729 1
two 0.366502 1
或更好的选择(thanks to @Dark for the hint):
df1['label'] = df1.index.get_level_values(0).map(df2['label'].get)
答案 1 :(得分:2)
另一种方法是在第二级只有reset_index
,然后你可以只添加将在第一级索引值上对齐的列,然后再次设置索引:
In[52]:
df3 = df1.reset_index(level=1)
df3['label'] = df2['label']
df3 = df3.set_index([df3.index, 'second'])
df3
Out[52]:
0 label
first second
bar one 0.957417 0
two -0.466755 0
baz one 1.064326 1
two 1.036983 1
foo one -1.319737 0
two 0.064465 0
qux one -0.237232 1
two -0.511889 1