我得到包含股价数据的csv文件。示例如下。
'dttm','open','high','low','close'
"2014/01/01 09:16:00",6365.2001953125,6369.89990234375,6355,6355,0
"2014/01/01 09:17:00",6355.64990234375,6359.9501953125,6355.5498046875,6359.5498046875,0
"2014/01/01 09:18:00",6359.5,6359.7998046875,6358,6359,0
"2014/01/01 09:19:00",6358.9501953125,6359.4501953125,6357.5498046875,6359,0
"2014/01/01 09:20:00",6359,6359,6355.64990234375,6356.5,0
.....likewise till "2014/01/01 15:30:30" (and for further dates ahead)
每一行都包含一分钟的数据。
问题:-
有时会跳过一分钟的数据。例如“ 2014/01/01 09:18:00”行将不存在。
这妨碍了我的程序逻辑。
我需要什么:-
用于验证csv文件是否在每个日期的09:15:15到15:30:30之间每1分钟显示一行。如果没有,请复制上一行并插入该分钟(不存在)。
任何人都可以。帮助吗?
谢谢。
答案 0 :(得分:1)
您基本上可以读取两个连续的行并获得时间增量。如果不是1分钟,则表示您错过了一行。只需在末尾用换行符写入csv!您还可以将所有内容写入新的CSV文件。
import csv
import datetime
f = open("your_file.csv", "w+")
ff = csv.reader(f)
pre_line = ff.next()
while(True):
try:
cur_line = ff.next()
if cur_line - pre_line != # 1 minute difference:
f.write(pre_line)
f.write('/n')
except:
break
答案 1 :(得分:1)
以下是您可以使用的示例代码:
from dateutil.parser import parse
from datetime import datetime, timedelta
data = [
("2014/01/01 09:16:00",6365.2001953125,6369.89990234375,6355,6355,0),
("2014/01/01 09:17:00",6355.64990234375,6359.9501953125,6355.5498046875,6359.5498046875,0),
("2014/01/01 09:20:00",6359,6359,6355.64990234375,6356.5,0),
]
def insert_into_db(date, open, high, low, close, zero):
print('inserting {} {} {} {} {} {}'.format(date, open, high, low, close, zero))
prev_date = None
for date, open, high, low, close, zero in data:
date = parse(date)
if prev_date is not None and date - prev_date > timedelta(minutes=1):
for i in reversed(range((date - prev_date).seconds // 60 - 1)):
date_between = date - timedelta(minutes=1 * i + 1)
insert_into_db(date_between, open, high, low, close, zero)
insert_into_db(date, open, high, low, close, zero)
prev_date = date
输出为:
inserting 2014-01-01 09:16:00 6365.2001953125 6369.89990234375 6355 6355 0
inserting 2014-01-01 09:17:00 6355.64990234375 6359.9501953125 6355.5498046875 6359.5498046875 0
inserting 2014-01-01 09:18:00 6358.9501953125 6359.4501953125 6357.5498046875 6359 0
inserting 2014-01-01 09:19:00 6358.9501953125 6359.4501953125 6357.5498046875 6359 0
inserting 2014-01-01 09:20:00 6359 6359 6355.64990234375 6356.5 0
但是您应该确保显示开始和结束分钟(或修改代码)。
更新:修复了丢失多分钟的情况