Question

我有这样的字典：

Dict={'A':0.0697,'B':0.1136,'C':0.2227,'D':0.2725,'E':0.4555}

我希望我的输出像这样：如果我的数据框中的值 LESS THAN 分别为0.0697,0.1136,0.2227,0.2725,0.4555，则返回A，B，C，D，E;否则返回F

我试过了：

TrainTest['saga1'] = TrainTest['saga'].apply(lambda x,v: Dict[x] if x<=v else 'F')

但它返回错误：

TypeError: <lambda>() takes exactly 2 arguments (1 given)

Answer 1

让我们制作一些测试数据：

saga = pd.Series([0.1, 0.2, 0.3, 0.4, 0.5, 0.9])

接下来，要认识到Dict是dict并且没有排序，所以让我们按相反的顺序排序数字：

thresh = sorted(Dict.items(), key=lambda t: t[1], reverse=True)

最后，通过循环而不是saga而不是thresh来解决问题，因为Python / Pandas中的循环（和apply()）很慢，我们假设saga是比thresh长得多：

result = pd.Series('F', saga.index) # all F's to start
for name, value in thresh:
    result[saga < value] = name

现在result是适当的一系列值A，B，C，D，E，F - 我们以相反的顺序循环，因为例如0小于所有值，应标记为A，而不是E。

Answer 2

关于运行时间：

In [160]:%%timeit
# loop over smaller thresh, not << saga
for name, value in thresh:
    result[saga < value] = name
100 loops, best of 3: 2.59 ms per loop

这是大熊猫的运行时间：

 saga1 = pd.DataFrame([0.05,0.1, 0.2, 0.3, 0.4, 0.5, 0.9],columns=['c1'])
 def mapF(s):
     # descending 
     curr='F'
     for name, value in thresh:
         if s < value:
             curr = name
     return curr

使用map / apply：

In [149]: %%timeit
saga1['result'] = saga1['c1'].map(lambda x: mapF(x) )
1000 loops, best of 3: 311 µs per loop

使用矢量化：

In [166]:%%timeit
import numpy as np
saga1['result'] = np.vectorize(mapF)(saga1['c1'])
1000 loops, best of 3: 244 µs per loop

** saga1 
+---+------+--------+
|   |  c1  | result |
+---+------+--------+
| 0 | 0.05 |   A    |
| 1 | 0.1  |   B    |
| 2 | 0.2  |   C    |
| 3 | 0.3  |   E    |
| 4 | 0.4  |   E    |
| 5 | 0.5  |   F    |
| 6 | 0.9  |   F    |
+---+------+--------+

如何将条件传递给lambda？

2 个答案: