熊猫数据框数据验证

时间:2020-10-08 04:36:09

标签: python python-3.x pandas dataframe validation

此代码使用pandasschema包来验证从csv文件加载的数据帧中的数据。我需要的是不使用任何验证包的代码,它仅应使用函数,条件语句或异常等。我尝试了这种方法,但是它不起作用df['customertype'].isin(['type1','type2']),这给我带来了错误“列表索引必须为整数”或切片,而不是str错误”。请帮忙

from pandas_schema.validation import (
    InListValidation
    ,IsDtypeValidation
    ,DateFormatValidation
    ,MatchesPatternValidation
)

schema = Schema([
    # Match a string of length between 1 and 5
    Column('CompanyID', [MatchesPatternValidation(r".{1,5}")]),

    # Match a date-like string of ISO 8601 format (https://www.iso.org/iso-8601-date-and-time-format.html)
    Column('initialdate', [DateFormatValidation("%Y-%m-%d %H:%M:%S")], allow_empty=True),
    
    # Match only strings in the following list
    Column('customertype', [InListValidation(["type1", "type2", "type3"])]),

    # Match an IP address RegEx (https://www.oreilly.com/library/view/regular-expressions-cookbook/9780596802837/ch07s16.html)
    Column('ip', [MatchesPatternValidation(r"(?:[0-9]{1,3}\.){3}[0-9]{1,3}")]),

    # Match only strings in the following list    
    Column('customersatisfied', [InListValidation(["yes", "no"])], allow_empty=True)
])

0 个答案:

没有答案