为什么此查询会根据我安排DateTime算术的方式给出不同的结果?

时间:2018-04-04 15:25:25

标签: python sqlalchemy

我使用SqlAlchemy创建了一个表Record。每条记录都有一个字段date,用于存储DateTime。我想查找日期比八小时前更新的所有记录。

我想出了四种编写滤波器的方法,所有这些都涉及比较当前时间,记录时间和8小时时间的简单算法。问题是:这些过滤器中有一半会在8小时窗口之外返回行。

from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
import datetime

Base = declarative_base()

class Record(Base):
    __tablename__ = 'record'
    id = Column(Integer, primary_key=True)
    date = Column(DateTime, nullable=False)

engine = create_engine('sqlite:///records.db')
Base.metadata.create_all(engine)
DBSession = sessionmaker(bind=engine)
session = DBSession()

#if the db is empty, add some records to the database with datetimes corresponding to one year ago and one hour ago and yesterday
now = datetime.datetime(2018, 4, 4, 10, 0, 0)
if not session.query(Record).all():
    session.add(Record(date = now - datetime.timedelta(days=365)))
    session.add(Record(date = now - datetime.timedelta(days=1)))
    session.add(Record(date = now - datetime.timedelta(hours=1)))


delta = datetime.timedelta(hours=8)

#these are all equivalent to "records from the last eight hours"
criterion = [
    (now - Record.date < delta),
    (Record.date > now - delta),
    (delta > now - Record.date),
    (now - delta < Record.date),
]

for idx, crit in enumerate(criterion):
    query = session.query(Record).filter(crit)
    print("\n\nApproach #{}.".format(idx))
    print("Generated statement:")
    print(query.statement)
    records = query.all()
    print("{} row(s) retrieved.".format(len(records)))
    for record in query.all():
        print(record.id, record.date)

结果:

Approach #0.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE :date_1 - record.date < :param_1
3 row(s) retrieved.
1 2017-04-04 10:00:00
2 2018-04-03 10:00:00
3 2018-04-04 09:00:00


Approach #1.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE record.date > :date_1
1 row(s) retrieved.
3 2018-04-04 09:00:00


Approach #2.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE :date_1 - record.date < :param_1
3 row(s) retrieved.
1 2017-04-04 10:00:00
2 2018-04-03 10:00:00
3 2018-04-04 09:00:00


Approach #3.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE record.date > :date_1
1 row(s) retrieved.
3 2018-04-04 09:00:00

方法1和3是正确的 - 它们从一小时前返回记录,而不是一天前或一年前的记录。方法0和2是不正确的,因为它们返回一天前的记录和一年前的记录以及一小时前的记录。

造成这种差异的原因是什么?我注意到#1和#3生成的语句只参数化一个日期时间对象,而#0和#2参数化日期时间对象和timedelta对象。 timedeltas是否以一种不寻常的方式参数化,使它们不适合这种算术?

1 个答案:

答案 0 :(得分:8)

As noted by unutbu,当timedelta个对象用作不支持原生Interval类型的数据库的绑定参数时,它们将转换为相对于“epoch”的时间戳(1。 1970年1月)。 SQLite就是这样一个数据库,MySQL也是如此。打开日志记录时另一个值得注意的事情是datetime值为stored and passed as ISO formatted strings

SQLite中的DATETIME column has NUMERIC affinity,但由于ISO格式的字符串无法无损地转换为数值,因此它们保留了TEXT存储类。另一方面,这很好,因为SQLite中的3 ways to store date and time data

  
      
  • TEXT 为ISO8601字符串(“YYYY-MM-DD HH:MM:SS.SSS”)。
  •   
  • REAL 作为朱利安日数,是公元前4714年11月24日格林威治中午以来的天数。根据预感格里高利历。
  •   
  • INTEGER as Unix Time,自1970-01-01 00:00:00 UTC以来的秒数。
  •   

当您尝试在数据库中执行算术时,事情会变得更有趣:

In [18]: session.execute('SELECT :date_1 - record.date FROM record',
    ...:                 {"date_1": now}).fetchall()
2018-04-04 20:47:35,045 INFO sqlalchemy.engine.base.Engine SELECT ? - record.date FROM record
INFO:sqlalchemy.engine.base.Engine:SELECT ? - record.date FROM record
2018-04-04 20:47:35,045 INFO sqlalchemy.engine.base.Engine (datetime.datetime(2018, 4, 4, 10, 0),)
INFO:sqlalchemy.engine.base.Engine:(datetime.datetime(2018, 4, 4, 10, 0),)
Out[18]: [(1,), (0,), (0,)]

原因是all mathematical operators cast their operands to NUMERIC storage class,即使结果值是有损的 - 或者对此事没有意义。在这种情况下,年份部分被解析,其余部分被忽略。

由于any INTEGER or REAL value is less比任何TEXT或BLOB值,所得到的整数值与给定的ISO格式间隔字符串之间的所有比较都为真:

In [25]: session.execute(text('SELECT :date_1 - record.date < :param_1 FROM record')
    ...:                 .bindparams(bindparam('param_1', type_=Interval)),
    ...:                 {"date_1": now, "param_1": delta}).fetchall()
    ...:                 
2018-04-04 20:55:36,952 INFO sqlalchemy.engine.base.Engine SELECT ? - record.date < ? FROM record
INFO:sqlalchemy.engine.base.Engine:SELECT ? - record.date < ? FROM record
2018-04-04 20:55:36,952 INFO sqlalchemy.engine.base.Engine (datetime.datetime(2018, 4, 4, 10, 0), '1970-01-01 08:00:00.000000')
INFO:sqlalchemy.engine.base.Engine:(datetime.datetime(2018, 4, 4, 10, 0), '1970-01-01 08:00:00.000000')
Out[25]: [(1,), (1,), (1,)]

有些人可能会将所有这些称为漏洞抽象,但在SQLAlchemy中为数据库实现之间的所有差异提供解决方案是一项艰巨的任务或不可能的任务。就个人而言,我发现它不会妨碍它,但允许使用数据库的功能,但使用一个漂亮的Python DSL。如果您确实需要在单个代码库中支持不同数据库中的时差,请使用适当的特定于数据库的编译器创建custom construct

要实际计算SQLite中的差异,并与给定timedeltaneed to use the strftime()函数中的总秒数进行比较,以便将ISO格式的字符串转换为自纪元以来的秒数。只要你转换Python datetime并将结果转换为秒,julianday()也可以工作。将2个行为不当的比较替换为例如:

# Not sure if your times were supposed to be UTC or not
now_ts = now.replace(tzinfo=datetime.timezone.utc).timestamp()
delta_s = delta.total_seconds()

# Not quite pretty...
criterion = [
    (now_ts - func.strftime('%s', Record.date) < delta_s,
    (Record.date > now - delta),
    (delta_s > now_ts - func.strftime('%s', Record.date)),
    (now - delta < Record.date),
]