我在下面的数据框中有以下内容:
col1 col2 col3
0 1601286 NAN NAN
1 1601286 1135 2018-12-31
2 1601286 NAN NAN
3 1601286 1135 2018-12-31
4 1601286 NAN 2018-12-31
5 1601286 1135 2018-12-31
6 1601286 1135 2018-12-31
7 1601286 1135 2018-12-31
8 1601286 NAN NAN
我需要验证一下这三列中只有一列应该有一个值。如果notnull()有多个,则应该为False
。
例如,上述数据的输出应为,
0 True
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
尝试执行以下肯定可以做的事情:-
df= df[['col1', 'col2', 'col3']].notnull().any(axis=1)
我该如何处理。
答案 0 :(得分:4)
df.isnull().sum(1).eq(2)
或:
df.isnull().sum(1).gt(1)
或:
df.notnull().sum(1).lt(2)
或:
df.notnull().sum(1).eq(1)
0 True
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
dtype: bool
答案 1 :(得分:1)
df = pd.DataFrame([[1601286,np.NaN,np.NaN],
[1601286,1135,2018-12-31],
[1601286,np.NaN,np.NaN],
[1601286,1135,2018-12-31],
[1601286,np.NaN,2018-12-31],
[1601286,1135,2018-12-31],
[1601286,1135,2018-12-31],
[1601286,1135,2018-12-31],
[1601286,np.NaN,np.NaN]], columns=['col1','col2','col3'])
df['count_notnull']=df.count(axis=1) # Will give a count of non-NULLs.
df['bool'] = df['count_notnull'].map(lambda x:x==1) # Since we need only 1 non-Null,
# so we test the condition here.
df
col1 col2 col3 count_notnull bool
0 1601286 NaN NaN 1 True
1 1601286 1135.0 1975.0 3 False
2 1601286 NaN NaN 1 True
3 1601286 1135.0 1975.0 3 False
4 1601286 NaN 1975.0 2 False
5 1601286 1135.0 1975.0 3 False
6 1601286 1135.0 1975.0 3 False
7 1601286 1135.0 1975.0 3 False
8 1601286 NaN NaN 1 True
答案 2 :(得分:1)
pandas.isnull
together with pandas.sum
and then check your condition. For example
import pandas as pd
import numpy as np
d = {'A':[1, 2, 3, np.NaN, 5], 'B':[1, 2, np.NaN, np.NaN, 5], 'C':[1, 2, np.NaN, np.NaN, np.NaN]}
print(pd.DataFrame(d).isnull().sum(axis=1)>1)
Output
0 False
1 False
2 True
3 True
4 False
dtype: bool