我在熊猫数据框中有以下数据点:
finder.on( 'toolbar:reset:Main:file', function( evt ) {
var toRemove = evt.data.toolbar.filter( function( button ) {
return button.get( 'name' ) === 'Settings';
} );
evt.data.toolbar.remove( toRemove );
}, null, null, 1000 );
我想应用一个函数将所有大于1的数据值转换为1: 有没有办法将以下两个lambda函数合并为一个(如else语句)?
DateTime Data
2017-11-21 18:54:31 1
2017-11-22 02:26:48 2
2017-11-22 10:19:44 3
2017-11-22 15:11:28 6
2017-11-22 23:21:58 7
2017-11-28 14:28:28 28
2017-11-28 14:36:40 0
2017-11-28 14:59:48 1
最终结果:
[(lambda x: x/x)(x) for x in df['Data'] if x > 0]
[(lambda x: x)(x) for x in df['Data'] if x <1 ]
答案 0 :(得分:4)
使用np.clip
-
df['Data'] = np.clip(df.Data.values, a_min=None, a_max=1)
df
DateTime Data
0 2017-11-21 18:54:31 1
1 2017-11-22 02:26:48 1
2 2017-11-22 10:19:44 1
3 2017-11-22 15:11:28 1
4 2017-11-22 23:21:58 1
5 2017-11-28 14:28:28 1
6 2017-11-28 14:36:40 0
7 2017-11-28 14:59:48 1
传递a_min=None
以指定无下限。
答案 1 :(得分:3)
您可以使用clip_upper
:
df['Data'] = df['Data'].clip_upper(1)
或者使用ge
(>=
)作为布尔值掩码并转换为int
,如果没有负值:
df['Data'] = df['Data'].ge(1).astype(int)
print (df)
DateTime Data
0 2017-11-21 18:54:31 1
1 2017-11-22 02:26:48 1
2 2017-11-22 10:19:44 1
3 2017-11-22 15:11:28 1
4 2017-11-22 23:21:58 1
5 2017-11-28 14:28:28 1
6 2017-11-28 14:36:40 0
7 2017-11-28 14:59:48 1
但是如果想要使用列表理解(在更大的DataFrame中它应该更慢):
df['Data'] = [1 if x > 0 else x for x in df['Data']]
print (df)
DateTime Data
0 2017-11-21 18:54:31 1
1 2017-11-22 02:26:48 1
2 2017-11-22 10:19:44 1
3 2017-11-22 15:11:28 1
4 2017-11-22 23:21:58 1
5 2017-11-28 14:28:28 1
6 2017-11-28 14:36:40 0
7 2017-11-28 14:59:48 1
<强>计时强>:
#[8000 rows x 5 columns]
df = pd.concat([df]*1000).reset_index(drop=True)
In [28]: %timeit df['Data2'] = df['Data'].clip_upper(1)
1000 loops, best of 3: 308 µs per loop
In [29]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
1000 loops, best of 3: 425 µs per loop
In [30]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
100 loops, best of 3: 3.02 ms per loop
#[800000 rows x 5 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [32]: %timeit df['Data2'] = df['Data'].clip_upper(1)
100 loops, best of 3: 9.32 ms per loop
In [33]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
100 loops, best of 3: 4.76 ms per loop
In [34]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
1 loop, best of 3: 274 ms per loop