PANDAS vlookup使用地图

时间:2017-09-13 21:45:26

标签: python pandas

import pandas as pd
import numpy as np

pb = {"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222"},"mark_up":{"0":1.2987,"1":1.5625,"2":1.3698,"3":1.3333,"4":1.4589}}

data = {"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73","5":"K74","6":"K75","7":"K79","8":"K86","9":"K100"},"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48,"5":9.48,"6":24.36,"7":5.16,"8":9.8,"9":3.28},"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222","5":"333","6":"444","7":"555","8":"666","9":"777"}}

pb = pd.DataFrame(data=pb).set_index('mark_up_id')
df = pd.DataFrame(data=data)

我知道我可以使用类似的东西:

df['mark_up_id'].map(pb['mark_up'])

执行v-look-up。我想对这个回报进行加价,并将每个成本乘以一个共同的索引,以产生一个名为price的新列。

我知道我可以将两者合并然后运行计算。这就是我如何产生所需的输出。我希望能够做到这一点类似于你如何遍历字典并使用键在另一个字典中查找值并在循环内执行某种计算。考虑到PANDAS数据帧位于字典之上,必须有一种方法可以使用join / map / apply的组合来实现这一点,而无需实际连接内存中的两个数据集。

期望的输出:

desired_output = {"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48},"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73"},"mark_up_id":{"0":"123","1":"456","2":"111","3":"123","4":"789"},"price":{"0":38.623338,"1":14.71875,"2":12.559686,"3":12.233754,"4":12.985704}}
do = pd.DataFrame(data=desired_output)

奖励积分:

解释接受的答案与......之间的区别

pb.loc[df['mark_up_id']]['mark_up'] * df.set_index('mark_up_id')['cost']

以及为什么我从上面得到的以下lambda函数出现错误......

df.apply(lambda x : x['cost']*pb.loc[x['mark_up_id']],axis=1 )

返回错误说:

KeyError: ('the label [333] is not in the [index]', u'occurred at index 5')

4 个答案:

答案 0 :(得分:3)

尝试

df['price'] = df['mark_up_id'].map(pb['mark_up']) * df['cost']

你得到了

    cost    id  mark_up_id  price
0   29.74   K69 123         38.623338
1   9.42    K70 456         14.718750
2   9.42    K71 111         12.559686
3   9.42    K72 123         12.233754
4   9.48    K73 789         12.985704

答案 1 :(得分:2)

更新问题的

更新

In [79]: df = df.assign(price=df['mark_up_id'].map(pb['mark_up']) * df['cost']).dropna()

In [80]: df
Out[80]:
    cost   id mark_up_id      price
0  29.74  K69        123  38.623338
1   9.42  K70        456  14.718750
2   9.42  K71        789  12.903516
3   9.42  K72        111  12.559686
4   9.48  K73        222  13.830372

旧回答:

In [67]: df = df.assign(price=df['mark_up_id'].map(pb['mark_up']) * df['cost'])

In [68]: df
Out[68]:
    cost   id mark_up_id      price
0  29.74  K69        123  38.623338
1   9.42  K70        456  14.718750
2   9.42  K71        111  12.559686
3   9.42  K72        123  12.233754
4   9.48  K73        789  12.985704

答案 2 :(得分:2)

使用merge

df=df.merge(df1,left_on='mark_up_id',right_index=True)
df.assign(price=df['cost'].mul(df['mark_up'])).drop('mark_up',1)
Out[254]: 
    cost   id mark_up_id      price
0  29.74  K69        123  38.623338
3   9.42  K72        123  12.233754
1   9.42  K70        456  14.718750
2   9.42  K71        111  12.559686
4   9.48  K73        789  12.985704

如果你确实想要applylambda:它真的很丑......真的...

df.apply(lambda x : x['cost']*df1.loc[x['mark_up_id']],axis=1 )

改为(更丑陋...... T_T)

df.apply(lambda x :x['cost']*df1.loc[x['mark_up_id']] if pd.Series(x['mark_up_id']).isin(df1.index)[0] else np.nan,axis=1 )

答案 3 :(得分:0)

df['price'] = df['cost'] * df['mark_up_id'].map(pb['markup'])

现在df将是您想要的输出。