为了解决复杂的SQL查询而苦苦挣扎。
这是一个包含表格/数据http://sqlfiddle.com/#!2/7de65
的sqlfiddle如果我解释表格正在做什么,可能会更有意义;
时刻表是火车时刻表的列表,呼叫是火车将通过时所订购的时刻表的呼叫点列表,当确认列车将要运行 1时创建激活/ sup>并且当列车在指定的呼叫点上移动时创建一个移动。
调用通过calling.sid与调度相关联。激活通过activations.sid与计划相关联。移动通过运动活动与激活相关联,并通过movement.calling_id进行调用。
现在实际问题;
我想生成每分钟有效的列车列表。如果
,列车被认为是活动的如果符合所有这些标准,列车应始终被视为有效,因此列在计数中。
根据上述数字小车中的数据,火车在14:20离开它的第一个呼叫点,并在15:04到达它的最后一个呼叫点,它应该包括在每分钟14:20-15:04。我想知道是否有人可以阐明如何做到这一点。我不认为自己是一名SQL专家(可能为什么我会挣扎,我实际上并不认为自己模糊不清,但这是一个不同的问题,或者可能是相同的,我'我不确定。)
我开始走这条路了
SELECT
YEAR( activations.activated ),
MONTH( activations.activated ),
DAY( activations.activated ),
HOUR( activations.activated ),
MINUTE( activations.activated ),
count(activations.id)
FROM activations, movement, calling, schedules
WHERE activations.id = movement.activation AND movement.calling_id = calling.id AND schedules.id = activations.sid
GROUP BY DAYOFYEAR( activations.activated ) , HOUR( activations.activated ), MINUTE(activations.activated )
但我知道这是错的,因为火车只会被列出一次,不管它被激活了多长时间。
我还考虑过直接在Python中使用指定时间段的每一分钟进行循环,它有点像这样但它超级慢(在24小时结果时以分钟分辨率获得活动列车在1440查询中,未完全优化)。所以我认为以太必须是一些聪明的分组,或SQL中的某种循环,但我不知道如何做以太。
因此,如果我运行14:18到15:07的查询,我会得到类似
的内容+-----------------+------------------+
| Timestamp | Active services |
+-----------------+------------------+
| 14:18 1/1/2014 | 0 |
| 14:19 1/1/2014 | 0 |
| 14:20 1/1/2014 | 1 |
| 14:21 1/1/2014 | 1 |
| 14:22 1/1/2014 | 1 |
[...
Identical record for every minute through to
...]
| 15:03 1/1/2014 | 1 |
| 15:04 1/1/2014 | 1 |
| 15:05 1/1/2014 | 0 |
| 15:06 1/1/2014 | 0 |
| 15:07 1/1/2014 | 0 |
+-----------------+------------------+
(只要我以后可以解析时间戳的格式并不重要)
在我的脑海中,我可以看到它有点像这样工作(伪代码)
while time is between report_start_date and report_end_date:
records = count(
activations where number of movements(
movement.actual < time
) > 0 //Number of movements created before current minute
and
movement.calling_id = calling_points(
actual < minute
).last.id does not exist //As of this minute doesn't have a movement for last calling point
and
activations.activated > now - 24 hours //Was activated less than 24 hours ago
)
result timestamp, records
time + 1 minute
我几乎将记录= count()位排序,它只是以太循环或按时间分组我不确定。我可以按照第一个移动记录的日期进行分组,但记录只会在第一分钟显示。我希望它能为它活跃的每一分钟展示。
我实际上是尝试在SQLAlchemy中实现它(因此标记),我试图在将其移植到SQLAlchemy查询之前尝试使用SQL的基础知识,但是如果你可以在SQL中执行它< em>和 SQLAlchemy / Python你会得到一些东西,我还不确定它是什么,它可能是假设的。
答案 0 :(得分:0)
为了为每一分钟生成一些结果,我不会依赖于每个可能的分钟都是数据库中某个表中的值的事实。出于这个原因,我实际上会在数据库中创建一个“静态”表,它只存储那些时间戳,我们将从那里开始构建一个查询。我做了以下事情:
CREATE TABLE "static_time" (
"yyyymmddhhmm" datetime NOT NULL,
PRIMARY KEY ("yyyymmddhhmm")
);
注意:对于我使用sqlite
数据库的所有测试,您可能需要在某些地方更改以使用相应的mysql
构造。
我还添加了为期2天的所有数据进行测试。您可能应该执行相同的操作,从您希望运行第一个分析到将来的某个重要年份(例如:2050-12-31T23:59:00
)。我使用sqlalchemy
做到了这一点,但我确信使用某个函数或循环直接执行此操作是有意义的:
class StaticTime(Base):
__tablename__ = 'static_time'
__table_args__ = ({'autoload': True, },)
# ...
def populate_static_time():
print "Adding static times"
sdt = datetime(2014, 1, 1)
edt = sdt + timedelta(days=2)
cdt = sdt
while cdt <= edt:
session.add(StaticTime(yyyymmddhhmm = cdt))
cdt += timedelta(minutes=1)
session.commit()
populate_static_time()
此外,我假设你的SA模型包括如下定义的关系:
# MODEL
class Schedule(Base):
__tablename__ = 'schedules'
__table_args__ = ({'autoload': True, },)
class Calling(Base):
__tablename__ = 'calling'
__table_args__ = ({'autoload': True, },)
class Activation(Base):
__tablename__ = 'activations'
__table_args__ = ({'autoload': True, },)
# relationships:
schedule = relationship("Schedule")
class Movement(Base):
__tablename__ = 'movement'
__table_args__ = ({'autoload': True, },)
# relationships:
# @note: use activation_rel as activation is column name
activation_rel = relationship("Activation", backref="movements")
现在,让我们构建查询:
# 0. start with all times and proper counting (columns in SELECT)
q = session.query(
StaticTime.yyyymmddhhmm.label("yyyymmddhhmm"),
func.count(Activation.id.distinct()).label("count"),
)
# 1. join on the trains which are active (or finished, which will be excluded later)
q = q.filter(Activation.movements.any(Movement.actual < StaticTime.yyyymmddhhmm))
# 2. join on the trains which are not finished (or no rows for those that did not)
# 2.a) subquery to get the "last" calling per sid
last_calling_sqry = (session.query(
Calling.sid.label("sid"),
func.max(Calling.id).label("max_calling_id"),
)
.group_by(Calling.sid)
).subquery("xxx")
# 2.b) subquery to find the movement for the "last" colling
train_done_at_sqry = (session.query(
Activation.id.label("activation_id"),
Movement.actual.label("arrived_time"),
)
.join(last_calling_sqry, Activation.sid == last_calling_sqry.c.sid)
.join(Movement, and_(
Movement.calling_id == last_calling_sqry.c.max_calling_id,
Movement.activation == Activation.id,
))
).subquery("yyy")
# 2.c) lets use it now
q = q.outerjoin(train_done_at_sqry,
train_done_at_sqry.c.activation_id == Activation.id,
)
# 2.d) only those that arrived "after" currently tested time
q = q.filter(train_done_at_sqry.c.arrived_time >= StaticTime.yyyymmddhhmm)
# 3. add filter to use only those trains that started in last 24 hours
# @note: do not need this in case when step-X is used as well as it filters
# @TODO: replace func.date(...) with MYSQL version
q = q.filter(Activation.activated >= func.date("now", "-1 days"))
# 4. filter and group by
q = q.group_by(StaticTime.yyyymmddhhmm)
q = q.order_by(StaticTime.yyyymmddhhmm)
# @NOTE: at this point "q" will return only those minutes which have at least 1 active train
# X. FINALLY: WRAP AGAIN TO HAVE ALL MINUTES (also those with no active trains)
sub = q.subquery("sub")
w = session.query(
StaticTime.yyyymmddhhmm.label("Timestamp"),
func.ifnull(sub.c.count, 0).label("Active Services")
)
w = w.outerjoin(sub, sub.c.yyyymmddhhmm == StaticTime.yyyymmddhhmm)
# @TODO: replace func.date(...) with MYSQL version
w = w.filter(Activation.activated >= func.date("now", "-1 days"))
for a in w:
print a
这是一个相当复杂的查询,只给出您提供的数据,很难测试不同的场景。但希望您能够与当前结果进行比较,代码将为您提供有关如何完成此操作的一些提示。另外,我可能在某些地方加入了错误的列(actual
vs planned
)。再次,这可能不适用于mysql(我没有它,也不太了解它。)
奖励(颠倒): w
sqlite
查询生成的SQL语句。您可能会发现从原始SQL
开始更容易,并逐渐转向sqlalchemy
。
SELECT static_time.yyyymmddhhmm AS "Timestamp", ifnull(sub.count, ?) AS "Active Services"
FROM static_time
LEFT OUTER JOIN (
SELECT static_time.yyyymmddhhmm AS yyyymmddhhmm, count(DISTINCT activations.id) AS count
FROM activations, static_time
LEFT OUTER JOIN (
SELECT activations.id AS activation_id, movement.actual AS arrived_time
FROM activations
JOIN (
SELECT calling.sid AS sid, max(calling.id) AS max_calling_id
FROM calling
GROUP BY calling.sid
) AS xxx
ON activations.sid = xxx.sid
JOIN movement
ON movement.calling_id = xxx.max_calling_id AND movement.activation = activations.id
) AS yyy
ON yyy.activation_id = activations.id
WHERE (EXISTS (SELECT 1
FROM movement
WHERE activations.id = movement.activation AND movement.actual < static_time.yyyymmddhhmm)
)
AND yyy.arrived_time >= static_time.yyyymmddhhmm
GROUP BY static_time.yyyymmddhhmm
ORDER BY static_time.yyyymmddhhmm
) AS sub
ON sub.yyyymmddhhmm = static_time.yyyymmddhhmm
WHERE static_time.yyyymmddhhmm >= ? AND static_time.yyyymmddhhmm <= ?
PARAMS: (0, '2014-01-01 14:15:00.000000', '2014-01-01 15:10:00.000000')