我有两个数据帧:
DF ONE:
ID A B C
1 x y z
1 x y z
2 x y z
2 x y z
2 x y z
3 x y z
DF TWO:
ID D E F
1 a b c1
2 a b c2
3 a b c3
我想从DF TWO获取列E
,并将其放在DF ONE上,如果ID相同,那么在我得到此输出之后:
ID A B C F
1 x y z c1
1 x y z c1
2 x y z c2
2 x y z c2
2 x y z c2
3 x y z c3
谢谢你的帮助
答案 0 :(得分:5)
您可以dict
使用map
:
d = df2.set_index('ID')['F'].to_dict()
print (d)
{1: 'c1', 2: 'c2', 3: 'c3'}
df1['F'] = df1['ID'].map(d)
print (df1)
ID A B C F
0 1 x y z c1
1 1 x y z c1
2 2 x y z c2
3 2 x y z c2
4 2 x y z c2
5 3 x y z c3
另一种解决方案是map
Series
:
s = df2.set_index('ID')['F']
print (s)
ID
1 c1
2 c2
3 c3
Name: F, dtype: object
df1['F'] = df1['ID'].map(s)
print (df1)
ID A B C F
0 1 x y z c1
1 1 x y z c1
2 2 x y z c2
3 2 x y z c2
4 2 x y z c2
5 3 x y z c3
<强>计时强>:
#[60000 rows x 5 columns]
df1 = pd.concat([df1]*10000).reset_index(drop=True)
In [115]: %timeit pd.merge(df1, df2[['ID', 'F']],how='left')
100 loops, best of 3: 11.1 ms per loop
In [116]: %timeit df1['ID'].map(df2.set_index('ID')['F'])
100 loops, best of 3: 3.18 ms per loop
In [117]: %timeit df1['ID'].map(df2.set_index('ID')['F'].to_dict())
100 loops, best of 3: 3.36 ms per loop
In [118]: %timeit df1['ID'].map({k:v for k, v in df2[['ID', 'F']].as_matrix()})
100 loops, best of 3: 3.44 ms per loop
In [119]: %%timeit
...: df2.index = df2['ID']
...: df1['F1'] = df1['ID'].map(df2['F'])
...:
100 loops, best of 3: 3.33 ms per loop
答案 1 :(得分:2)
您需要从df2
创建地图,您可以这样做:
mapping = {k:v for k, v in df2[['ID', 'F']].as_matrix()}
然后将它们应用于df1
:
df1['F'] = df1['ID'].map(mapping)
或者您可以使用:
df1 = pd.merge(df1, df2[['ID', 'F']],how='left')
答案 2 :(得分:1)
您可以使用map
,将[{1}}设置为ID
的数据框TWO的索引:
df2.index = df2['ID']