numpy.where vs pandas.Series.map

时间:2018-10-03 09:45:03

标签: python pandas numpy

我想了解这两种方式,我应该使用哪一种...或者还有更好的方式吗?

df = pd.DataFrame({'values' : [1, 27, 256, 312, ...]})
df['clip_values'] = df['values'].map(lambda x : 20 if x > 20 else x)
df['clip_values_v2'] = np.where(df['values'] > 20, 20, df['values'])

谢谢

1 个答案:

答案 0 :(得分:2)

不确定是否更好,这是clip-

df['clip_values'] = df['values'].values.clip(max=20)

大数据定时-

In [172]: df = pd.DataFrame({'values' : np.random.randint(0,100,(1000000))})

In [173]: %timeit df['clip_values'] = df['values'].map(lambda x : 20 if x > 20 else x)
1 loop, best of 3: 193 ms per loop

In [174]: %timeit df['clip_values_v2'] = np.where(df['values'] > 20, 20, df['values'])
100 loops, best of 3: 6.12 ms per loop

In [175]: %timeit df['clip_values_v3'] = df['values'].values.clip(max=20)
100 loops, best of 3: 2.95 ms per loop