如何将(1)设置为pandas数据帧中的最大元素,将(0)设置为其他所有元素?

时间:2015-09-01 05:50:35

标签: pandas

假设我有一个pandas DataFrame。

df = pd.DataFrame(index = [ix for ix in range(10)], columns=list('abcdef'), data=np.random.randn(10,6))

df:

         a         b         c         d         e         f
0 -1.238393 -0.755117 -0.228638 -0.077966  0.412947  0.887955
1 -0.342087  0.296171  0.177956  0.701668 -0.481744 -1.564719
2  0.610141  0.963873 -0.943182 -0.341902  0.326416  0.818899
3 -0.561572  0.063588 -0.195256 -1.637753  0.622627  0.845801
4 -2.506322 -1.631023  0.506860  0.368958  1.833260  0.623055
5 -1.313919 -1.758250 -1.082072  1.266158  0.427079 -1.018416
6 -0.781842  1.270133 -0.510879 -1.438487 -1.101213 -0.922821
7 -0.456999  0.234084  1.602635  0.611378 -1.147994  1.204318
8  0.497074  0.412695 -0.458227  0.431758  0.514382 -0.479150
9 -1.289392 -0.218624  0.122060  2.000832 -1.694544  0.773330

如何将set 1设置为rowwise max,将0设置为其他元素?

我想出了:

>>> for i in range(len(df)):
...     df.loc[i][df.loc[i].idxmax(axis=1)] = 1
...     df.loc[i][df.loc[i] != 1] = 0

生成 DF:

   a  b  c  d  e  f
0  0  0  0  0  0  1
1  0  0  0  1  0  0
2  0  1  0  0  0  0
3  0  0  0  0  0  1
4  0  0  0  0  1  0
5  0  0  0  1  0  0
6  0  1  0  0  0  0
7  0  0  1  0  0  0
8  0  0  0  0  1  0
9  0  0  0  1  0  0

有没有人有更好的方法呢?可能是通过摆脱for循环或应用lambda?

3 个答案:

答案 0 :(得分:1)

使用max并使用eq检查是否相等,并使用astype将布尔df强制转换为int,这会将TrueFalse转换为{{ 1}}和1

0

<强>计时

In [21]:
df = pd.DataFrame(index = [ix for ix in range(10)], columns=list('abcdef'), data=np.random.randn(10,6))
df

Out[21]:
          a         b         c         d         e         f
0  0.797000  0.762125 -0.330518  1.117972  0.817524  0.041670
1  0.517940  0.357369 -1.493552 -0.947396  3.082828  0.578126
2  1.784856  0.672902 -1.359771 -0.090880 -0.093100  1.099017
3 -0.493976 -0.390801 -0.521017  1.221517 -1.303020  1.196718
4  0.687499 -2.371322 -2.474101 -0.397071  0.132205  0.034631
5  0.573694 -0.206627 -0.106312 -0.661391 -0.257711 -0.875501
6 -0.415331  1.185901  1.173457  0.317577 -0.408544 -1.055770
7 -1.564962 -0.408390 -1.372104 -1.117561 -1.262086 -1.664516
8 -0.987306  0.738833 -1.207124  0.738084  1.118205 -0.899086
9  0.282800 -1.226499  1.658416 -0.381222  1.067296 -1.249829

In [22]:
df = df.eq(df.max(axis=1), axis=0).astype(int)
df

Out[22]:
   a  b  c  d  e  f
0  0  0  0  1  0  0
1  0  0  0  0  1  0
2  1  0  0  0  0  0
3  0  0  0  1  0  0
4  1  0  0  0  0  0
5  1  0  0  0  0  0
6  0  1  0  0  0  0
7  0  1  0  0  0  0
8  0  0  0  0  1  0
9  0  0  1  0  0  0

你可以看到我的方法比@Raihan的方法快了12倍

In [24]:
# @Raihan Masud's method
%timeit df.apply( lambda x: np.where(x == x.max() , 1 , 0) , axis = 1)
# mine
%timeit df.eq(df.max(axis=1), axis=0).astype(int)
100 loops, best of 3: 7.94 ms per loop
1000 loops, best of 3: 640 µs per loop

In [25]:
# @Nader Hisham's method
%%timeit 
def max_binary(df):
    binary = np.where( df == df.max() , 1 , 0 )
    return binary
​
df.apply( max_binary , axis = 1)
100 loops, best of 3: 9.63 ms per loop

In [4]: %%timeit for i in range(len(df)): df.loc[i][df.loc[i].idxmax(axis=1)] = 1 df.loc[i][df.loc[i] != 1] = 0 10 loops, best of 3: 21.1 ms per loop 循环也明显变慢

答案 1 :(得分:0)

import numpy as np


def max_binary(df):
        binary = np.where( df == df.max() , 1 , 0 )
        return binary


df.apply( max_binary , axis = 1)

答案 2 :(得分:0)

遵循Nader的模式,这是一个较短的版本:

df.apply( lambda x: np.where(x == x.max() , 1 , 0) , axis = 1)