将查找表应用于数据框中的容器或范围

时间:2017-03-27 14:31:26

标签: python python-3.x pandas

我有一个如下所示的DataFrame。假设这些是销售人员列表的销售额。

enter image description here

此外,我有一个查找表,其中包含按金额计算的佣金。这看起来如下。所以,$ 0- $ 50,000 = 5%,$ 50,001- $ 250,000 = 4%等等。

enter image description here

我想要做的是将查找表应用于sales表以生成以下DataFrame。

enter image description here

尝试1:

In [66]: a
Out[66]: 
   Sales_1  Sales_2  Sales_3
0   200000   300000   100000
1   100000   500000   500000
2   400000  1000000   200000

In [67]: b
Out[67]: 
            Commission
Sales                 
50000             0.05
250000            0.04
750000            0.03
9999999999        0.02

In [68]: c = b['Commission'][a <= b.index.values]
Traceback (most recent call last):

  File "<ipython-input-68-d229bce29f01>", line 1, in <module>
    c = b['Commission'][a <= b.index.values]

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py", line 1184, in f
    res = self._combine_const(other, func, raise_on_error=False)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 3555, in _combine_const
    raise_on_error=raise_on_error)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2911, in eval
    return self.apply('eval', **kwargs)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2890, in apply
    applied = getattr(b, f)(**kwargs)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1132, in eval
    result = get_result(other)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1103, in get_result
    result = func(values, other)

ValueError: operands could not be broadcast together with shapes (3,3) (4,) 

尝试2:

In [59]: a
Out[59]: 
   Sales_1  Sales_2  Sales_3
0   200000   300000   100000
1   100000   500000   500000
2   400000  1000000   200000

In [60]: b
Out[60]: 
            Commission
Sales                 
50000             0.05
250000            0.04
750000            0.03
9999999999        0.02

In [61]: c = b.lookup(a['Sales_1'],['Commission'])
Traceback (most recent call last):

  File "<ipython-input-61-99e8134e826c>", line 1, in <module>
    c = b.lookup(a['Sales_1'],['Commission'])

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2649, in lookup
    raise ValueError('Row labels must have same size as column labels')

ValueError: Row labels must have same size as column labels

有人可以帮我将查找表应用到DataFrame吗?它不一定非常像这样,但这说明了我的一般需求。

2 个答案:

答案 0 :(得分:8)

要使用范围,pd.cut是您的朋友。根据您当前的b数据框,您只需修改作为参数传递的bin列表以定义最低范围。在这里我把0作为负面销售不存在,但如果需要你也可以添加任何负数,甚至可以为你的下边界和上边界处理-np.infnp.inf而不是1E14

pd.cut(a.stack(), [0] + b.Sales.tolist(), labels=b.Commission).unstack()
Out[39]: 
  Sales_1 Sales_2 Sales_3
0    0.04    0.03    0.04
1    0.04    0.03    0.03
2    0.03    0.02    0.04

我发现下面的b更清晰,可以用于剪切:

          Sales  Commission
0          -inf         NaN
1         50000        0.05
2        250000        0.04
3        750000        0.03
4           inf        0.02

然后论证成为:

pd.cut(a.stack(), b.Sales, labels=b.Commission[1:]).unstack()

答案 1 :(得分:2)

@Boud已经在公园里击中了这个。但这是我的需要

numpy

使用searchsorted

pd.DataFrame(
    b.Commission.values[
        b.index.values.searchsorted(a.values.ravel())
    ].reshape(a.values.shape),
    a.index, a.columns)

   Sales_1  Sales_2  Sales_3
0     0.04     0.03     0.04
1     0.04     0.03     0.03
2     0.03     0.02     0.04

pandas

使用pd.merge_asof
我也stack a并且还移动了边界定义

a_ = a.stack().sort_values().to_frame('Sales')
b_ = pd.DataFrame(dict(
        Sales=np.append(0, b.index[:-1]),
        Commissions=b.Commission.values
    ))

print(a_)
print()
print(b_)

             Sales
0 Sales_3   100000
1 Sales_1   100000
0 Sales_1   200000
2 Sales_3   200000
0 Sales_2   300000
2 Sales_1   400000
1 Sales_2   500000
  Sales_3   500000
2 Sales_2  1000000

   Commissions   Sales
0         0.05       0
1         0.04   50000
2         0.03  250000
3         0.02  750000

现在我们可以使用pd.merge_asof

pd.merge_asof(a_, b_).set_index(a_.index).Commissions.unstack()

   Sales_1  Sales_2  Sales_3
0     0.04     0.03     0.04
1     0.04     0.03     0.03
2     0.03     0.02     0.04

天真时间测试

enter image description here