如何将条件传递给lambda?

时间:2018-02-11 03:21:45

标签: python pandas lambda

我有这样的字典:

Dict={'A':0.0697,'B':0.1136,'C':0.2227,'D':0.2725,'E':0.4555} 

我希望我的输出像这样: 如果我的数据框中的值 LESS THAN 分别为0.0697,0.1136,0.2227,0.2725,0.4555,则返回A,B,C,D,E;否则返回F

我试过了:

TrainTest['saga1'] = TrainTest['saga'].apply(lambda x,v: Dict[x] if x<=v else 'F')

但它返回错误:

TypeError: <lambda>() takes exactly 2 arguments (1 given)

2 个答案:

答案 0 :(得分:2)

让我们制作一些测试数据:

saga = pd.Series([0.1, 0.2, 0.3, 0.4, 0.5, 0.9])

接下来,要认识到Dictdict并且没有排序,所以让我们按相反的顺序排序数字:

thresh = sorted(Dict.items(), key=lambda t: t[1], reverse=True)

最后,通过循环而不是saga而不是thresh来解决问题,因为Python / Pandas中的循环(和apply())很慢,我们假设saga是比thresh长得多:

result = pd.Series('F', saga.index) # all F's to start
for name, value in thresh:
    result[saga < value] = name

现在result是适当的一系列值A,B,C,D,E,F - 我们以相反的顺序循环,因为例如0小于所有值,应标记为A,而不是E。

答案 1 :(得分:1)

关于运行时间:

In [160]:%%timeit
# loop over smaller thresh, not << saga
for name, value in thresh:
    result[saga < value] = name
100 loops, best of 3: 2.59 ms per loop

这是大熊猫的运行时间:

 saga1 = pd.DataFrame([0.05,0.1, 0.2, 0.3, 0.4, 0.5, 0.9],columns=['c1'])
 def mapF(s):
     # descending 
     curr='F'
     for name, value in thresh:
         if s < value:
             curr = name
     return curr

使用map / apply:

In [149]: %%timeit
saga1['result'] = saga1['c1'].map(lambda x: mapF(x) )
1000 loops, best of 3: 311 µs per loop

使用矢量化:

In [166]:%%timeit
import numpy as np
saga1['result'] = np.vectorize(mapF)(saga1['c1'])
1000 loops, best of 3: 244 µs per loop

** saga1 
+---+------+--------+
|   |  c1  | result |
+---+------+--------+
| 0 | 0.05 |   A    |
| 1 | 0.1  |   B    |
| 2 | 0.2  |   C    |
| 3 | 0.3  |   E    |
| 4 | 0.4  |   E    |
| 5 | 0.5  |   F    |
| 6 | 0.9  |   F    |
+---+------+--------+