所以我有一个来源的五分钟间隔的数据和另一个来源的一分钟间隔的数据。我所做的是将五分钟间隔数据加载到一些sqlite表中,其中包含各种数据类型之间的关系。表中还有一列一分钟数据。现在,我想要做的是找到与五分钟数据匹配的一分钟数据,更新该行,然后将五分钟数据向前传播到一分钟数据的新行。
即。 -
DB before DB after
row [time] [1-min] [5-min] row [time] [1-min] [5-min]
5-t0 null d0 5-t0 m0 d0
5-t1 null d1 1-t1 m1 d0
... 1-t2 m2 d0
1-t3 m3 d0
5-t1 m4 d1
等等。
问题是我用来做这个的功能非常慢。因为,在寒冷的冬日比糖蜜慢。我是sql操作和sqlalchemy的新手,所以对我的功能的任何批评将不胜感激。这就是我所拥有的:
import glob
import gc
import csv
from datetime import datetime, timedelta
from sqlalchemy import create_engine, and_
from sqlalchemy.orm import sessionmaker, exc, lazyload
from metar import Metar
from db_models import Base, BaseObservation, SkyObservation
def w_convertion(w_str):
if w_str == 'No Data':
return None
else:
return float(w_str)
def db_update(W_PATH):
with open(W_PATH) as f:
reader = csv.DictReader(f)
for line in reader:
time = datetime.strptime(line['time'], '%m/%d/%Y %H:%M')
qry = session.query(BaseObservation).\
options(lazyload(BaseObservation.sky_observations))
try:
result = qry.filter_by(time=time).one()
if result.W:
pass
setattr(result, 'W', w_convertion(line['W']))
session.commit()
except exc.MultipleResultsFound:
print('More than one result. Skipped {}'.format(time))
pass
except exc.NoResultFound:
dt = timedelta(minutes=(time.minute % 5))
t = (time - dt)
# find previous observation
try:
result = qry.filter_by(time=t).one()
keeper_dict = dict( (k, v) for k,v in result.__dict__.items() if k in keepers )
# make new observation with previous data
obs = BaseObservation(**keeper_dict)
setattr(obs, 'time', time)
setattr(obs, 'W', w_convertion(line['W']))
session.add(obs)
session.flush()
# link new observation to sky observations
for sky_obs in result.sky_observations:
sky_obs.base_observations.append(obs)
session.commit()
except exc.NoResultFound:
print('No previous weather obs found for {}. Point skipped.'.format(time))
pass
gc.collect()
print('W additions done.')