检查数据框中仅存在列值

时间:2018-12-27 12:45:41

标签: python pandas dataframe

我在下面的数据框中有以下内容:

          col1                col2         col3
0          1601286            NAN         NAN
1          1601286            1135         2018-12-31
2          1601286            NAN         NAN
3          1601286            1135         2018-12-31
4          1601286            NAN         2018-12-31
5          1601286            1135         2018-12-31
6          1601286            1135         2018-12-31
7          1601286            1135         2018-12-31
8          1601286            NAN        NAN

我需要验证一下这三列中只有一列应该有一个值。如果notnull()有多个,则应该为False

例如,上述数据的输出应为,

0 True
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True

尝试执行以下肯定可以做的事情:-

df= df[['col1', 'col2', 'col3']].notnull().any(axis=1)

我该如何处理。

3 个答案:

答案 0 :(得分:4)

使用isnullsum

df.isnull().sum(1).eq(2)

或:

df.isnull().sum(1).gt(1)

或:

df.notnull().sum(1).lt(2)

或:

df.notnull().sum(1).eq(1)

0     True
1    False
2     True
3    False
4    False
5    False
6    False
7    False
8     True
dtype: bool

答案 1 :(得分:1)

df = pd.DataFrame([[1601286,np.NaN,np.NaN],
[1601286,1135,2018-12-31],
[1601286,np.NaN,np.NaN],
[1601286,1135,2018-12-31],
[1601286,np.NaN,2018-12-31],
[1601286,1135,2018-12-31],
[1601286,1135,2018-12-31],
[1601286,1135,2018-12-31],
[1601286,np.NaN,np.NaN]], columns=['col1','col2','col3'])

df['count_notnull']=df.count(axis=1)                # Will give a count of non-NULLs.
df['bool'] = df['count_notnull'].map(lambda x:x==1) # Since we need only 1 non-Null, 
                                                    # so we test the condition here.

df

      col1    col2    col3  count_notnull   bool
0  1601286     NaN     NaN              1   True
1  1601286  1135.0  1975.0              3  False
2  1601286     NaN     NaN              1   True
3  1601286  1135.0  1975.0              3  False
4  1601286     NaN  1975.0              2  False
5  1601286  1135.0  1975.0              3  False
6  1601286  1135.0  1975.0              3  False
7  1601286  1135.0  1975.0              3  False
8  1601286     NaN     NaN              1   True

答案 2 :(得分:1)

pandas.isnull together with pandas.sum and then check your condition. For example

import pandas as pd
import numpy as np

d = {'A':[1, 2, 3, np.NaN, 5], 'B':[1, 2, np.NaN, np.NaN, 5], 'C':[1, 2, np.NaN, np.NaN, np.NaN]}
print(pd.DataFrame(d).isnull().sum(axis=1)>1)

Output

0    False
1    False
2     True
3     True
4    False
dtype: bool