python根据模型数据验证csv文件数据

时间:2018-06-27 14:32:55

标签: python csv

我得到包含股价数据的csv文件。示例如下。

'dttm','open','high','low','close'
"2014/01/01 09:16:00",6365.2001953125,6369.89990234375,6355,6355,0
"2014/01/01 09:17:00",6355.64990234375,6359.9501953125,6355.5498046875,6359.5498046875,0
"2014/01/01 09:18:00",6359.5,6359.7998046875,6358,6359,0
"2014/01/01 09:19:00",6358.9501953125,6359.4501953125,6357.5498046875,6359,0
"2014/01/01 09:20:00",6359,6359,6355.64990234375,6356.5,0
.....likewise till "2014/01/01 15:30:30"  (and for further dates ahead)

每一行都包含一分钟的数据。

问题:-
有时会跳过一分钟的数据。例如“ 2014/01/01 09:18:00”行将不存在。
这妨碍了我的程序逻辑。

我需要什么:-
用于验证csv文件是否在每个日期的09:15:15到15:30:30之间每1分钟显示一行。如果没有,请复制上一行并插入该分钟(不存在)。

任何人都可以。帮助吗?
谢谢。

2 个答案:

答案 0 :(得分:1)

您基本上可以读取两个连续的行并获得时间增量。如果不是1分钟,则表示您错过了一行。只需在末尾用换行符写入csv!您还可以将所有内容写入新的CSV文件。

import csv
import datetime
f = open("your_file.csv", "w+")
ff = csv.reader(f)    
pre_line = ff.next()
while(True):
    try:
        cur_line = ff.next()
        if cur_line - pre_line != # 1 minute difference:
            f.write(pre_line)
            f.write('/n')
    except:
        break

答案 1 :(得分:1)

以下是您可以使用的示例代码:

from dateutil.parser import parse
from datetime import datetime, timedelta


data = [
    ("2014/01/01 09:16:00",6365.2001953125,6369.89990234375,6355,6355,0),
    ("2014/01/01 09:17:00",6355.64990234375,6359.9501953125,6355.5498046875,6359.5498046875,0),
    ("2014/01/01 09:20:00",6359,6359,6355.64990234375,6356.5,0),
]


def insert_into_db(date, open, high, low, close, zero):
    print('inserting {} {} {} {} {} {}'.format(date, open, high, low, close, zero))

prev_date = None
for date, open, high, low, close, zero in data:
    date = parse(date)

    if prev_date is not None and date - prev_date > timedelta(minutes=1):
        for i in reversed(range((date - prev_date).seconds // 60 - 1)):
            date_between = date - timedelta(minutes=1 * i + 1)
            insert_into_db(date_between, open, high, low, close, zero)

    insert_into_db(date, open, high, low, close, zero)
    prev_date = date

输出为:

inserting 2014-01-01 09:16:00 6365.2001953125 6369.89990234375 6355 6355 0
inserting 2014-01-01 09:17:00 6355.64990234375 6359.9501953125 6355.5498046875 6359.5498046875 0
inserting 2014-01-01 09:18:00 6358.9501953125 6359.4501953125 6357.5498046875 6359 0
inserting 2014-01-01 09:19:00 6358.9501953125 6359.4501953125 6357.5498046875 6359 0
inserting 2014-01-01 09:20:00 6359 6359 6355.64990234375 6356.5 0

但是您应该确保显示开始和结束分钟(或修改代码)。

更新:修复了丢失多分钟的情况