将具有多列的数据帧映射为密钥pandas

时间:2017-03-28 08:13:54

标签: python pandas

>>> p1.head()           
   StreamId            Timestamp    SeqNum
0         3  1490250116391063414  1158
1         3  1490250116391348339  3600
2         3  1490250116391542829  3600
3         3  1490250116391577184  1437
4         3  1490250116392819426  1389


>>> oss.head()
   OrderID    Symbol  Stream     SeqNo
0  5000000  AXBANK       3      1158
1  5000001  AXBANK       6      1733
2  5000002  AXBANK       6      1244
3  5000003  AXBANK       6      1388
4  5000004  AXBANK       3      1389

如何使用2个属性作为键(SeqNum和StreamId)

进行合并
>>> merge
   OrderID    Symbol  Stream     SeqNo    Timestamp
0  5000000  AXBANK       3      1158      1490250116391063414
1  5000001  AXBANK       6      1733      NaN
2  5000002  AXBANK       6      1244      NaN
3  5000003  AXBANK       6      1388      NaN
4  5000004  AXBANK       3      1389      1490250116392819426

我尝试使用

oss['Time1'] = oss['SeqNo'].map.((p1.set_index('SeqNum')['Timestamp']))

但我需要将两者(SeqNum-SeqNo& Stream-StreamId)作为键包括在内 我知道如果我在两个数据帧中重命名列名并使用合并但我想避免这种情况,这可能很容易。我应该使用类似通用的东西(采用这个数据帧,将THESE列映射到另一个数据帧中的那些列并获取所需的库存)

2 个答案:

答案 0 :(得分:4)

使用join

oss.join(p1.set_index(['StreamId', 'SeqNum']), on=['Stream', 'SeqNo'])

   OrderID  Symbol  Stream  SeqNo     Timestamp
0  5000000  AXBANK       3   1158  1.490250e+18
1  5000001  AXBANK       6   1733           NaN
2  5000002  AXBANK       6   1244           NaN
3  5000003  AXBANK       6   1388           NaN
4  5000004  AXBANK       3   1389  1.490250e+18

答案 1 :(得分:2)

我认为merge需要drop

print (pd.merge(oss, p1, left_on=['Stream','SeqNo'], 
                         right_on=['StreamId','SeqNum'],how='left')
          .drop(['StreamId','SeqNum'], axis=1))

   OrderID  Symbol  Stream  SeqNo     Timestamp
0  5000000  AXBANK       3   1158  1.490250e+18
1  5000001  AXBANK       6   1733           NaN
2  5000002  AXBANK       6   1244           NaN
3  5000003  AXBANK       6   1388           NaN
4  5000004  AXBANK       3   1389  1.490250e+18

另一个rename列的解决方案:

d = {'Stream':'StreamId','SeqNo':'SeqNum'}
print (pd.merge(oss.rename(columns=d), p1, how='left'))
   OrderID  Symbol  StreamId  SeqNum     Timestamp
0  5000000  AXBANK         3    1158  1.490250e+18
1  5000001  AXBANK         6    1733           NaN
2  5000002  AXBANK         6    1244           NaN
3  5000003  AXBANK         6    1388           NaN
4  5000004  AXBANK         3    1389  1.490250e+18