根据python中的优先级计算给定查询的前10个产品

时间:2016-07-28 07:33:01

标签: python pandas

假设我们获得了如下数据框:

                       Query  Productid  priority
index
0                        3ds    2125233  0.018946
1                        rca    2009324  0.027599
2                       nook    1517163  0.009443
3                        rca    2877125  0.012054
4                        rca    2877134  0.005557
5              flatscreentvs    2416092  0.011961
6                    macbook    3108172  0.010459
7                        3ds    2264036  0.165948
8                        rca    8280834  0.004006
9                 memorycard    2740208  0.013744
10               acpowercord    2584273  0.006865
11                zaggiphone    1230537  0.136073
12            watchthethrone    3168067  0.104679
13     remotecontrolextender    7997055  0.113058
14                 camcorder    2009041  0.017809
15                       3ds    1988047  0.031711
16                       3ds    1686079  0.043783
17        wirelessheadphones    3770439  0.014714
18        wirelessheadphones    2602403  0.008525
19                 samsung40    2126065  0.018066

我希望根据给定查询的优先级找到前2 product_ids

例如。如果我们有query=3ds那么前两个产品应该是:

1. 1988047 
2. 1686079 

2 个答案:

答案 0 :(得分:1)

IIUC使用:

print (df.set_index('Productid').groupby('Query')['priority'].nlargest(2).reset_index())
                    Query  Productid  priority
0                     3ds    2264036  0.165948
1                     3ds    1686079  0.043783
2             acpowercord    2584273  0.006865
3               camcorder    2009041  0.017809
4           flatscreentvs    2416092  0.011961
5                 macbook    3108172  0.010459
6              memorycard    2740208  0.013744
7                    nook    1517163  0.009443
8                     rca    2009324  0.027599
9                     rca    2877125  0.012054
10  remotecontrolextender    7997055  0.113058
11              samsung40    2126065  0.018066
12         watchthethrone    3168067  0.104679
13     wirelessheadphones    3770439  0.014714
14     wirelessheadphones    2602403  0.008525
15             zaggiphone    1230537  0.136073

答案 1 :(得分:0)

这对Oracle的row_number()分析函数来说是等价的:

Randoop

显示所选In [172]: df.assign(rn=df.sort_values('priority', ascending=0).groupby('Query').cumcount() + 1).query('rn < 3').sort_values(['Query','rn']) Out[172]: Query Productid priority rn index 7 3ds 2264036 0.165948 1 16 3ds 1686079 0.043783 2 10 acpowercord 2584273 0.006865 1 14 camcorder 2009041 0.017809 1 5 flatscreentvs 2416092 0.011961 1 6 macbook 3108172 0.010459 1 9 memorycard 2740208 0.013744 1 2 nook 1517163 0.009443 1 1 rca 2009324 0.027599 1 3 rca 2877125 0.012054 2 13 remotecontrolextender 7997055 0.113058 1 19 samsung40 2126065 0.018066 1 12 watchthethrone 3168067 0.104679 1 17 wirelessheadphones 3770439 0.014714 1 18 wirelessheadphones 2602403 0.008525 2 11 zaggiphone 1230537 0.136073 1 的{​​{1}}:

Productid