如何根据给定行中值的第3次出现获取列?

时间:2018-07-10 00:57:36

标签: python python-3.x pandas

我需要创建一个包含10列(浮点数)的数据框,并且需要确保每行具有5个Nan值。

Data Frame Which I want to create 

A    B    C     D     E     F     G     H    I    J   
1.0  Nan  2.0   Nan   Nan   Nan   Nan   5.0  6.0  7.0
Nan  Nan  Nan   3.0   5.0   Nan   Nan   5.0  6.0  7.0
1.0   2.0  3.0   5.0   8.0   Nan   Nan   Nan  Nan  Nan
1.0   Nan  3.0   Nan  8.0   10.0  Nan   12.0  Nan  Nan

我想创建这种类型的数据集,其中每行具有5个NAN值和5个有效值。我想返回列值,该列值对于系列中的每一行都具有Nan值的第三次出现。

  Expected Output 
  E (it has 3rd occurrence of Nan value in 1st row) 
  C (it has 3rd occurrence of Nan value in 2nd row)
  H (it has 3rd occurrence of Nan value in 3rd row)
  G (it has 3rd occurrence of Nan value in 4th row)

3 个答案:

答案 0 :(得分:3)

cumsumargmax一起使用

df.columns[np.argmax(df.isnull().cumsum(1).eq(3).values,1)]
Out[788]: Index(['E', 'C', 'H', 'G'], dtype='object')

用于创建数据框

df=pd.DataFrame(np.random.randn(4, 10),columns=list('ABCDEFGHIJ'))
for x in range(len(df)):
    df.iloc[x,np.random.choice(10, 5, replace=False)]=np.nan
df
Out[783]: 
          A         B         C         D   E         F         G         H  \
0  1.263644       NaN -0.427018       NaN NaN  0.160732  0.033323 -1.285068   
1       NaN  2.713568 -0.964603  1.456543 NaN       NaN  0.201837  1.034501   
2       NaN       NaN       NaN -0.262311 NaN  0.361472 -0.089562  0.478207   
3       NaN  1.497916 -0.324090       NaN NaN       NaN  0.711363 -0.094587   
    I         J  
0 NaN       NaN  
1 NaN       NaN  
2 NaN  0.944062  
3 NaN -0.298129  

答案 1 :(得分:1)

使用void sock_init_data(struct socket *sock, struct sock *sk) { sk->sk_data_ready = sock_def_readable; sk->sk_write_space = sock_def_write_space; sk->sk_error_report = sock_def_error_report; } 查找所有为空的行,用isnullcumsum递增计数,过滤空计数等于3的地方,并用axis=1和{ {1}}获取列名。

idxmax

您可以使用以下帮助器函数创建一个具有5个值和5个空值的随机数据框。请注意,我使用了axis=1,因此这些值将是标准正态分布的浮点数,您可以用您选择的另一个随机分布替换

(df.isnull().cumsum(axis=1) == 3).idxmax(axis=1)

答案 2 :(得分:0)

出于好奇,出于时间表现的观点,我{@ {1}}使用了@Wen和@HaleemurAli的两种略有不同的方法:

%timeit