我正在使用python 3.8和Pandas_schema对数据运行完整性检查。我要求workflow_entry_step
永远不要与In Progress
相同。我试图生成一个同时比较两列的CustomSeriesValidation,因为我看不到执行此操作的stock函数。
是否有一种方法可以使用Pandas_Schema比较同一行中的两个单元格值?在此示例中,Pandas_Schema将为Mary返回错误,因为她已从In Progress
移至df = config.pd.DataFrame({
'prospect': ['Bob', 'Jill', 'Steve', 'Mary'],
'value': [10000, 15000, 500, 50000],
'workflow_entry_step': ['New', 'In Progress', 'Closed', 'In Progress'],
'workflow_next_step': ['In Progress', 'Closed' ,None, 'In Progress']})
schema = Schema([
Column('prospect', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
Column('value', [CanConvertValidation(int),'Doesn\'t convert to integer.']),
Column('workflow_entry_step', [InListValidation([None,'New','In Progress','Closed'])]),
Column('workflow_next_step', [CustomSeriesValidation(lambda x: x != Column('workflow_entry_step'), InListValidation([None,'New','In Progress','Closed'])]), 'Steps cannot be the same.')])
。
{{1}}
答案 0 :(得分:0)
import pandas as pd
df = pd.DataFrame({
'prospect': ['Bob', 'Jill', 'Steve', 'Mary'],
'value': [10000, 15000, 500, 50000],
'workflow_entry_step': ['New', 'In Progress', 'Closed', 'In Progress'],
'workflow_next_step': ['In Progress', 'Closed' ,None, 'In Progress']})
schema = Schema([
Column('prospect', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
Column('value', [CanConvertValidation(float)]),
Column('workflow_entry_step', [InListValidation([None,'New','In
Progress','Closed'])]),
Column('workflow_next_step', [CustomSeriesValidation(lambda x: x !=
df['workflow_entry_step'], 'Steps cannot be the same.'),
InListValidation([None,'New','In Progress','Closed'])])
])
errors = schema.validate(df)
for error in errors:
print(error)
输出:
{row: 3, column: "workflow_next_step"}: "In Progress" Steps cannot be the same.