在DataFrame列中设置最大值

时间:2017-12-19 06:17:59

标签: python pandas numpy dataframe

我在熊猫数据框中有以下数据点:

finder.on( 'toolbar:reset:Main:file', function( evt ) {
    var toRemove = evt.data.toolbar.filter( function( button ) {
        return button.get( 'name' ) === 'Settings';
    } );

    evt.data.toolbar.remove( toRemove );
}, null, null, 1000 );

我想应用一个函数将所有大于1的数据值转换为1: 有没有办法将以下两个lambda函数合并为一个(如else语句)?

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     2
2017-11-22 10:19:44     3
2017-11-22 15:11:28     6
2017-11-22 23:21:58     7
2017-11-28 14:28:28    28
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

最终结果:

[(lambda x: x/x)(x) for x in df['Data'] if x > 0]
[(lambda x: x)(x) for x in df['Data'] if x <1 ]

2 个答案:

答案 0 :(得分:4)

使用np.clip -

的Numpy解决方案
df['Data'] = np.clip(df.Data.values, a_min=None, a_max=1)
df

              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

传递a_min=None以指定无下限。

答案 1 :(得分:3)

您可以使用clip_upper

df['Data'] = df['Data'].clip_upper(1)

或者使用ge>=)作为布尔值掩码并转换为int,如果没有负值:

df['Data'] = df['Data'].ge(1).astype(int)

print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

但是如果想要使用列表理解(在更大的DataFrame中它应该更慢):

df['Data'] = [1 if x > 0 else x for x in df['Data']]
print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

<强>计时

#[8000 rows x 5 columns]
df = pd.concat([df]*1000).reset_index(drop=True)

In [28]: %timeit df['Data2'] = df['Data'].clip_upper(1)
1000 loops, best of 3: 308 µs per loop

In [29]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
1000 loops, best of 3: 425 µs per loop

In [30]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
100 loops, best of 3: 3.02 ms per loop

#[800000 rows x 5 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [32]: %timeit df['Data2'] = df['Data'].clip_upper(1)
100 loops, best of 3: 9.32 ms per loop

In [33]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
100 loops, best of 3: 4.76 ms per loop

In [34]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
1 loop, best of 3: 274 ms per loop