我有一个简单的CSV数据文件,该文件包含两行,分别为Object_Id
和VALUE
,而Object ID
的每个索引在另一行中的相同索引都有对应的值({{ 1}})。我的目的是读取这些索引,并使用预期数据验证这些数据。
我能够读取csv文件,但不确定如何验证数据。 这是csv文件的一部分:
VALUE
这是我正在尝试的代码:
Obj ID, Value, Time Stamp
13, 41.0, 2018-09-10 23:05:30
14, 14.0, 2018-09-10 23:05:20
13, 41.0, 2018-09-10 23:05:20
14, 14.0, 2018-09-10 23:05:09
这是我收到的输出,肯定与其他条目/条目不符。你能发表评论吗?
import csv
with open('testoutfile1.csv', 'r') as csvfile:
reader = csv.reader (csvfile, delimiter=';', quotechar='|')
observed_output=[]
expected_output=[]
for row in reader:
#print('; '.join(row))
observed_output = {row[0]:row[1]}
print(observed_output)
expected_output= {'Obj ID': 'Value','13':'41.0', '14':'14.0'}
print(expected_output)
for key in expected_output:
if key in observed_output:
print (key)
print (observed_output[key])
print (expected_output [key])
if (observed_output[key])== (expected_output [key]):
print ("Test Passed")
elif (observed_output[key])!= (expected_output [key]):
print ("Test Failed")
答案 0 :(得分:1)
为此尝试一下:
from pathlib import Path
import pandas as pd
import csv
doc = """Obj ID,Value,Time Stamp
13,41.0,2018-09-10 23:05:30
14,14.0,2018-09-10 23:05:20
13,41.0,2018-09-10 23:05:20
14,14.0,2018-09-10 23:05:09"""
#replicate a data file
Path('temp.csv').write_text(doc)
#read a csv to dicts
def read_dicts(filename, sep=",", names=['id', 'value', 'time']):
with open(filename, 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter=sep, fieldnames=names)
return [row for row in reader][1:]
dicts = read_dicts('temp.csv')
# you can start checking *dicts* from here
# use pandas
df = pd.read_csv('temp.csv', names = ['id', 'value', 'time'], header=0)
# this is not a great way to check (you loose information), but it seems what you ask for
assert df['value'].tolist() == [41.0, 14.0, 41.0, 14.0]
# if the data on objects does not change, check this way, write this to a fucntion
assert (df[df.id==13].value == 41).all()
assert (df[df.id==14].value == 14).all()
# you can replicate the above with a csv too.
要正确检查,您需要对数据结构做出明确的假设(值是否会随时间变化?)并相应地调整检查。
答案 1 :(得分:0)
我认为expected_output
应该是更简单的字典
expected_output = {'13':'41.0', '14':'14.0'}
接下来,您可以像这样
data = open('...')
next(data) # skip headers
for line in data:
id, val, *_ = [item.strip() for item in line.split(',')]
if id in expected_output and val == expected_output[id]:
# the observed output is the same as expected
...
else:
# observed is unexpected
...