假设我有一个pandas DataFrame。
df = pd.DataFrame(index = [ix for ix in range(10)], columns=list('abcdef'), data=np.random.randn(10,6))
df:
a b c d e f 0 -1.238393 -0.755117 -0.228638 -0.077966 0.412947 0.887955 1 -0.342087 0.296171 0.177956 0.701668 -0.481744 -1.564719 2 0.610141 0.963873 -0.943182 -0.341902 0.326416 0.818899 3 -0.561572 0.063588 -0.195256 -1.637753 0.622627 0.845801 4 -2.506322 -1.631023 0.506860 0.368958 1.833260 0.623055 5 -1.313919 -1.758250 -1.082072 1.266158 0.427079 -1.018416 6 -0.781842 1.270133 -0.510879 -1.438487 -1.101213 -0.922821 7 -0.456999 0.234084 1.602635 0.611378 -1.147994 1.204318 8 0.497074 0.412695 -0.458227 0.431758 0.514382 -0.479150 9 -1.289392 -0.218624 0.122060 2.000832 -1.694544 0.773330
如何将set 1设置为rowwise max,将0设置为其他元素?
我想出了:
>>> for i in range(len(df)):
... df.loc[i][df.loc[i].idxmax(axis=1)] = 1
... df.loc[i][df.loc[i] != 1] = 0
生成 DF:
a b c d e f 0 0 0 0 0 0 1 1 0 0 0 1 0 0 2 0 1 0 0 0 0 3 0 0 0 0 0 1 4 0 0 0 0 1 0 5 0 0 0 1 0 0 6 0 1 0 0 0 0 7 0 0 1 0 0 0 8 0 0 0 0 1 0 9 0 0 0 1 0 0
有没有人有更好的方法呢?可能是通过摆脱for循环或应用lambda?
答案 0 :(得分:1)
使用max
并使用eq
检查是否相等,并使用astype
将布尔df强制转换为int,这会将True
和False
转换为{{ 1}}和1
:
0
<强>计时强>
In [21]:
df = pd.DataFrame(index = [ix for ix in range(10)], columns=list('abcdef'), data=np.random.randn(10,6))
df
Out[21]:
a b c d e f
0 0.797000 0.762125 -0.330518 1.117972 0.817524 0.041670
1 0.517940 0.357369 -1.493552 -0.947396 3.082828 0.578126
2 1.784856 0.672902 -1.359771 -0.090880 -0.093100 1.099017
3 -0.493976 -0.390801 -0.521017 1.221517 -1.303020 1.196718
4 0.687499 -2.371322 -2.474101 -0.397071 0.132205 0.034631
5 0.573694 -0.206627 -0.106312 -0.661391 -0.257711 -0.875501
6 -0.415331 1.185901 1.173457 0.317577 -0.408544 -1.055770
7 -1.564962 -0.408390 -1.372104 -1.117561 -1.262086 -1.664516
8 -0.987306 0.738833 -1.207124 0.738084 1.118205 -0.899086
9 0.282800 -1.226499 1.658416 -0.381222 1.067296 -1.249829
In [22]:
df = df.eq(df.max(axis=1), axis=0).astype(int)
df
Out[22]:
a b c d e f
0 0 0 0 1 0 0
1 0 0 0 0 1 0
2 1 0 0 0 0 0
3 0 0 0 1 0 0
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 0 1 0 0 0 0
7 0 1 0 0 0 0
8 0 0 0 0 1 0
9 0 0 1 0 0 0
你可以看到我的方法比@Raihan的方法快了12倍
In [24]:
# @Raihan Masud's method
%timeit df.apply( lambda x: np.where(x == x.max() , 1 , 0) , axis = 1)
# mine
%timeit df.eq(df.max(axis=1), axis=0).astype(int)
100 loops, best of 3: 7.94 ms per loop
1000 loops, best of 3: 640 µs per loop
In [25]:
# @Nader Hisham's method
%%timeit
def max_binary(df):
binary = np.where( df == df.max() , 1 , 0 )
return binary
df.apply( max_binary , axis = 1)
100 loops, best of 3: 9.63 ms per loop
In [4]:
%%timeit
for i in range(len(df)):
df.loc[i][df.loc[i].idxmax(axis=1)] = 1
df.loc[i][df.loc[i] != 1] = 0
10 loops, best of 3: 21.1 ms per loop
循环也明显变慢
答案 1 :(得分:0)
import numpy as np
def max_binary(df):
binary = np.where( df == df.max() , 1 , 0 )
return binary
df.apply( max_binary , axis = 1)
答案 2 :(得分:0)
遵循Nader的模式,这是一个较短的版本:
df.apply( lambda x: np.where(x == x.max() , 1 , 0) , axis = 1)