Python验证CSV数据

时间:2018-09-10 18:19:12

标签: python csv validation

我有一个简单的CSV数据文件,该文件包含两行,分别为Object_IdVALUE,而Object ID的每个索引在另一行中的相同索引都有对应的值({{ 1}})。我的目的是读取这些索引,并使用预期数据验证这些数据。

我能够读取csv文件,但不确定如何验证数据。 这是csv文件的一部分:

VALUE

这是我正在尝试的代码:

Obj ID,    Value,    Time Stamp
13,    41.0,    2018-09-10 23:05:30
14,    14.0,    2018-09-10 23:05:20
13,    41.0,    2018-09-10 23:05:20
14,    14.0,    2018-09-10 23:05:09

这是我收到的输出,肯定与其他条目/条目不符。你能发表评论吗?

import csv
with open('testoutfile1.csv', 'r') as csvfile:
reader = csv.reader (csvfile, delimiter=';', quotechar='|')
observed_output=[]
expected_output=[]
for row in reader:
    #print(';  '.join(row))
    observed_output = {row[0]:row[1]}
    print(observed_output)
expected_output= {'Obj ID': 'Value','13':'41.0', '14':'14.0'}
print(expected_output)

for key in expected_output:
    if key in observed_output:
            print (key)
            print (observed_output[key])
            print (expected_output [key])
        if (observed_output[key])== (expected_output [key]):
            print ("Test Passed")
        elif (observed_output[key])!= (expected_output [key]):
            print ("Test Failed")

2 个答案:

答案 0 :(得分:1)

为此尝试一下:

from pathlib import Path
import pandas as pd
import csv    

doc = """Obj ID,Value,Time Stamp
13,41.0,2018-09-10 23:05:30
14,14.0,2018-09-10 23:05:20
13,41.0,2018-09-10 23:05:20
14,14.0,2018-09-10 23:05:09"""

#replicate a data file
Path('temp.csv').write_text(doc)  


#read a csv to dicts
def read_dicts(filename, sep=",", names=['id', 'value', 'time']): 
    with open(filename, 'r') as csvfile:
        reader = csv.DictReader(csvfile, delimiter=sep, fieldnames=names)
        return [row for row in reader][1:]            
dicts = read_dicts('temp.csv')
# you can start checking *dicts* from here

# use pandas 
df = pd.read_csv('temp.csv', names = ['id', 'value', 'time'], header=0)
# this is not a great way to check (you loose information), but it seems what you ask for
assert df['value'].tolist() == [41.0, 14.0, 41.0, 14.0]

# if the data on objects does not change, check this way, write this to a fucntion
assert (df[df.id==13].value == 41).all()
assert (df[df.id==14].value == 14).all()

# you can replicate the above with a csv too.

要正确检查,您需要对数据结构做出明确的假设(值是否会随时间变化?)并相应地调整检查。

答案 1 :(得分:0)

我认为expected_output应该是更简单的字典

expected_output = {'13':'41.0', '14':'14.0'}

接下来,您可以像这样

data = open('...')
next(data) # skip headers
for line in data:
    id, val, *_ = [item.strip() for item in line.split(',')]
    if id in expected_output and val == expected_output[id]:
        # the observed output is the same as expected
        ...
    else:
        # observed is unexpected
        ...