我有一个如下所示的DataFrame。假设这些是销售人员列表的销售额。
此外,我有一个查找表,其中包含按金额计算的佣金。这看起来如下。所以,$ 0- $ 50,000 = 5%,$ 50,001- $ 250,000 = 4%等等。
我想要做的是将查找表应用于sales表以生成以下DataFrame。
尝试1:
In [66]: a
Out[66]:
Sales_1 Sales_2 Sales_3
0 200000 300000 100000
1 100000 500000 500000
2 400000 1000000 200000
In [67]: b
Out[67]:
Commission
Sales
50000 0.05
250000 0.04
750000 0.03
9999999999 0.02
In [68]: c = b['Commission'][a <= b.index.values]
Traceback (most recent call last):
File "<ipython-input-68-d229bce29f01>", line 1, in <module>
c = b['Commission'][a <= b.index.values]
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py", line 1184, in f
res = self._combine_const(other, func, raise_on_error=False)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 3555, in _combine_const
raise_on_error=raise_on_error)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2911, in eval
return self.apply('eval', **kwargs)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2890, in apply
applied = getattr(b, f)(**kwargs)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1132, in eval
result = get_result(other)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1103, in get_result
result = func(values, other)
ValueError: operands could not be broadcast together with shapes (3,3) (4,)
尝试2:
In [59]: a
Out[59]:
Sales_1 Sales_2 Sales_3
0 200000 300000 100000
1 100000 500000 500000
2 400000 1000000 200000
In [60]: b
Out[60]:
Commission
Sales
50000 0.05
250000 0.04
750000 0.03
9999999999 0.02
In [61]: c = b.lookup(a['Sales_1'],['Commission'])
Traceback (most recent call last):
File "<ipython-input-61-99e8134e826c>", line 1, in <module>
c = b.lookup(a['Sales_1'],['Commission'])
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2649, in lookup
raise ValueError('Row labels must have same size as column labels')
ValueError: Row labels must have same size as column labels
有人可以帮我将查找表应用到DataFrame吗?它不一定非常像这样,但这说明了我的一般需求。
答案 0 :(得分:8)
要使用范围,pd.cut
是您的朋友。根据您当前的b
数据框,您只需修改作为参数传递的bin列表以定义最低范围。在这里我把0作为负面销售不存在,但如果需要你也可以添加任何负数,甚至可以为你的下边界和上边界处理-np.inf
和np.inf
而不是1E14
:
pd.cut(a.stack(), [0] + b.Sales.tolist(), labels=b.Commission).unstack()
Out[39]:
Sales_1 Sales_2 Sales_3
0 0.04 0.03 0.04
1 0.04 0.03 0.03
2 0.03 0.02 0.04
我发现下面的b
更清晰,可以用于剪切:
Sales Commission
0 -inf NaN
1 50000 0.05
2 250000 0.04
3 750000 0.03
4 inf 0.02
然后论证成为:
pd.cut(a.stack(), b.Sales, labels=b.Commission[1:]).unstack()
答案 1 :(得分:2)
numpy
使用searchsorted
pd.DataFrame(
b.Commission.values[
b.index.values.searchsorted(a.values.ravel())
].reshape(a.values.shape),
a.index, a.columns)
Sales_1 Sales_2 Sales_3
0 0.04 0.03 0.04
1 0.04 0.03 0.03
2 0.03 0.02 0.04
pandas
使用pd.merge_asof
我也stack
a
并且还移动了边界定义
a_ = a.stack().sort_values().to_frame('Sales')
b_ = pd.DataFrame(dict(
Sales=np.append(0, b.index[:-1]),
Commissions=b.Commission.values
))
print(a_)
print()
print(b_)
Sales
0 Sales_3 100000
1 Sales_1 100000
0 Sales_1 200000
2 Sales_3 200000
0 Sales_2 300000
2 Sales_1 400000
1 Sales_2 500000
Sales_3 500000
2 Sales_2 1000000
Commissions Sales
0 0.05 0
1 0.04 50000
2 0.03 250000
3 0.02 750000
现在我们可以使用pd.merge_asof
pd.merge_asof(a_, b_).set_index(a_.index).Commissions.unstack()
Sales_1 Sales_2 Sales_3
0 0.04 0.03 0.04
1 0.04 0.03 0.03
2 0.03 0.02 0.04
天真时间测试