将列与pandas_schema进行比较

时间:2020-09-11 15:38:57

标签: python-3.x pandas dataframe validation schema

我正在使用python 3.8和Pandas_schema对数据运行完整性检查。我要求workflow_entry_step永远不要与In Progress相同。我试图生成一个同时比较两列的CustomSeriesValidation,因为我看不到执行此操作的stock函数。

是否有一种方法可以使用Pandas_Schema比较同一行中的两个单元格值?在此示例中,Pandas_Schema将为Mary返回错误,因为她已从In Progress移至df = config.pd.DataFrame({ 'prospect': ['Bob', 'Jill', 'Steve', 'Mary'], 'value': [10000, 15000, 500, 50000], 'workflow_entry_step': ['New', 'In Progress', 'Closed', 'In Progress'], 'workflow_next_step': ['In Progress', 'Closed' ,None, 'In Progress']}) schema = Schema([ Column('prospect', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]), Column('value', [CanConvertValidation(int),'Doesn\'t convert to integer.']), Column('workflow_entry_step', [InListValidation([None,'New','In Progress','Closed'])]), Column('workflow_next_step', [CustomSeriesValidation(lambda x: x != Column('workflow_entry_step'), InListValidation([None,'New','In Progress','Closed'])]), 'Steps cannot be the same.')])

{{1}}

1 个答案:

答案 0 :(得分:0)

import pandas as pd

df = pd.DataFrame({
'prospect': ['Bob', 'Jill', 'Steve', 'Mary'], 
'value': [10000, 15000, 500, 50000],
'workflow_entry_step': ['New', 'In Progress', 'Closed', 'In Progress'], 
'workflow_next_step': ['In Progress', 'Closed' ,None, 'In Progress']})

schema = Schema([
Column('prospect', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
Column('value', [CanConvertValidation(float)]),
Column('workflow_entry_step', [InListValidation([None,'New','In 
Progress','Closed'])]),
Column('workflow_next_step', [CustomSeriesValidation(lambda x: x != 
df['workflow_entry_step'], 'Steps cannot be the same.'), 
InListValidation([None,'New','In Progress','Closed'])])
])
errors = schema.validate(df)
for error in errors:
    print(error)

输出:

{row: 3, column: "workflow_next_step"}: "In Progress" Steps cannot be the same.
相关问题