Question

I am building some unit tests of data and am having trouble writing a pythonic check of data.

I have a pandas DataFrame:

d = {'one' : pd.Series([.14, .52, 1.], index=['a', 'b', 'c']),
     'two' : pd.Series([.57, .25, .33, .98], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

Now, I want to verify that these columns have data that falls within the range [0,1]. I'd want a function:

check_data(df, column)

that just returns True if the data does fall in the range and False if it doesn't. So in my example data, check_data(df, 'one') returns False, check_data(df, 'two') returns True.

My head is trying to take on a row by row approach (thank my years of Excel VBA), but I know that's wrong. Anyone got a better approach?

Answer 1

您可以使用between和all来检查各个列：

>>> df['one'].between(0, 1).all()
False
>>> df['two'].between(0, 1).all()
True

between默认包含端点;更改此集inclusive=False。

如果您愿意，您还可以立即检查DataFrame的每一列：

>>> ((0 <= df) & (df <= 1)).all()
one    False
two     True
dtype: bool

Check if every row of column is within range

1 个答案: