我有这样的字典:
Dict={'A':0.0697,'B':0.1136,'C':0.2227,'D':0.2725,'E':0.4555}
我希望我的输出像这样: 如果我的数据框中的值 LESS THAN 分别为0.0697,0.1136,0.2227,0.2725,0.4555,则返回A,B,C,D,E;否则返回F
我试过了:
TrainTest['saga1'] = TrainTest['saga'].apply(lambda x,v: Dict[x] if x<=v else 'F')
但它返回错误:
TypeError: <lambda>() takes exactly 2 arguments (1 given)
答案 0 :(得分:2)
让我们制作一些测试数据:
saga = pd.Series([0.1, 0.2, 0.3, 0.4, 0.5, 0.9])
接下来,要认识到Dict
是dict
并且没有排序,所以让我们按相反的顺序排序数字:
thresh = sorted(Dict.items(), key=lambda t: t[1], reverse=True)
最后,通过循环而不是saga
而不是thresh
来解决问题,因为Python / Pandas中的循环(和apply()
)很慢,我们假设saga
是比thresh
长得多:
result = pd.Series('F', saga.index) # all F's to start
for name, value in thresh:
result[saga < value] = name
现在result
是适当的一系列值A,B,C,D,E,F - 我们以相反的顺序循环,因为例如0小于所有值,应标记为A,而不是E。
答案 1 :(得分:1)
关于运行时间:
In [160]:%%timeit
# loop over smaller thresh, not << saga
for name, value in thresh:
result[saga < value] = name
100 loops, best of 3: 2.59 ms per loop
这是大熊猫的运行时间:
saga1 = pd.DataFrame([0.05,0.1, 0.2, 0.3, 0.4, 0.5, 0.9],columns=['c1'])
def mapF(s):
# descending
curr='F'
for name, value in thresh:
if s < value:
curr = name
return curr
使用map / apply:
In [149]: %%timeit
saga1['result'] = saga1['c1'].map(lambda x: mapF(x) )
1000 loops, best of 3: 311 µs per loop
使用矢量化:
In [166]:%%timeit
import numpy as np
saga1['result'] = np.vectorize(mapF)(saga1['c1'])
1000 loops, best of 3: 244 µs per loop
** saga1
+---+------+--------+
| | c1 | result |
+---+------+--------+
| 0 | 0.05 | A |
| 1 | 0.1 | B |
| 2 | 0.2 | C |
| 3 | 0.3 | E |
| 4 | 0.4 | E |
| 5 | 0.5 | F |
| 6 | 0.9 | F |
+---+------+--------+