我需要创建一个包含10列(浮点数)的数据框,并且需要确保每行具有5个Nan值。
Data Frame Which I want to create
A B C D E F G H I J
1.0 Nan 2.0 Nan Nan Nan Nan 5.0 6.0 7.0
Nan Nan Nan 3.0 5.0 Nan Nan 5.0 6.0 7.0
1.0 2.0 3.0 5.0 8.0 Nan Nan Nan Nan Nan
1.0 Nan 3.0 Nan 8.0 10.0 Nan 12.0 Nan Nan
我想创建这种类型的数据集,其中每行具有5个NAN值和5个有效值。我想返回列值,该列值对于系列中的每一行都具有Nan值的第三次出现。
Expected Output
E (it has 3rd occurrence of Nan value in 1st row)
C (it has 3rd occurrence of Nan value in 2nd row)
H (it has 3rd occurrence of Nan value in 3rd row)
G (it has 3rd occurrence of Nan value in 4th row)
答案 0 :(得分:3)
将cumsum
与argmax
一起使用
df.columns[np.argmax(df.isnull().cumsum(1).eq(3).values,1)]
Out[788]: Index(['E', 'C', 'H', 'G'], dtype='object')
用于创建数据框
df=pd.DataFrame(np.random.randn(4, 10),columns=list('ABCDEFGHIJ'))
for x in range(len(df)):
df.iloc[x,np.random.choice(10, 5, replace=False)]=np.nan
df
Out[783]:
A B C D E F G H \
0 1.263644 NaN -0.427018 NaN NaN 0.160732 0.033323 -1.285068
1 NaN 2.713568 -0.964603 1.456543 NaN NaN 0.201837 1.034501
2 NaN NaN NaN -0.262311 NaN 0.361472 -0.089562 0.478207
3 NaN 1.497916 -0.324090 NaN NaN NaN 0.711363 -0.094587
I J
0 NaN NaN
1 NaN NaN
2 NaN 0.944062
3 NaN -0.298129
答案 1 :(得分:1)
使用void sock_init_data(struct socket *sock, struct sock *sk)
{
sk->sk_data_ready = sock_def_readable;
sk->sk_write_space = sock_def_write_space;
sk->sk_error_report = sock_def_error_report;
}
查找所有为空的行,用isnull
用cumsum
递增计数,过滤空计数等于3的地方,并用axis=1
和{ {1}}获取列名。
idxmax
您可以使用以下帮助器函数创建一个具有5个值和5个空值的随机数据框。请注意,我使用了axis=1
,因此这些值将是标准正态分布的浮点数,您可以用您选择的另一个随机分布替换
(df.isnull().cumsum(axis=1) == 3).idxmax(axis=1)
答案 2 :(得分:0)
出于好奇,出于时间表现的观点,我{@ {1}}使用了@Wen和@HaleemurAli的两种略有不同的方法:
%timeit