Question

为了解决复杂的SQL查询而苦苦挣扎。

这是一个包含表格/数据http://sqlfiddle.com/#!2/7de65

的sqlfiddle

如果我解释表格正在做什么，可能会更有意义;

时刻表是火车时刻表的列表，呼叫是火车将通过时所订购的时刻表的呼叫点列表，当确认列车将要运行^{1时创建激活/ sup>并且当列车在指定的呼叫点上移动时创建一个移动。}

调用通过calling.sid与调度相关联。激活通过activations.sid与计划相关联。移动通过运动活动与激活相关联，并通过movement.calling_id进行调用。

现在实际问题;

我想生成每分钟有效的列车列表。如果

，列车被认为是活动的

它至少有一个与它相关的动作（I.E.没有离开它的原点）
它没有与其最终通话点相关的动作
它在不到24小时前被激活

如果符合所有这些标准，列车应始终被视为有效，因此列在计数中。

根据上述数字小车中的数据，火车在14:20离开它的第一个呼叫点，并在15:04到达它的最后一个呼叫点，它应该包括在每分钟14：20-15：04。我想知道是否有人可以阐明如何做到这一点。我不认为自己是一名SQL专家（可能为什么我会挣扎，我实际上并不认为自己模糊不清，但这是一个不同的问题，或者可能是相同的，我＆＃39;我不确定。）

我开始走这条路了

SELECT
    YEAR( activations.activated ),
    MONTH( activations.activated ),
    DAY( activations.activated ),
    HOUR( activations.activated ),
    MINUTE( activations.activated ),
    count(activations.id)
FROM activations, movement, calling, schedules 
WHERE activations.id = movement.activation AND movement.calling_id = calling.id AND schedules.id = activations.sid
GROUP BY DAYOFYEAR( activations.activated ) , HOUR( activations.activated ), MINUTE(activations.activated )

但我知道这是错的，因为火车只会被列出一次，不管它被激活了多长时间。

我还考虑过直接在Python中使用指定时间段的每一分钟进行循环，它有点像这样但它超级慢（在24小时结果时以分钟分辨率获得活动列车在1440查询中，未完全优化）。所以我认为以太必须是一些聪明的分组，或SQL中的某种循环，但我不知道如何做以太。

因此，如果我运行14:18到15:07的查询，我会得到类似

的内容

+-----------------+------------------+
| Timestamp       | Active services  |
+-----------------+------------------+
| 14:18 1/1/2014  | 0                |
| 14:19 1/1/2014  | 0                |
| 14:20 1/1/2014  | 1                |
| 14:21 1/1/2014  | 1                |
| 14:22 1/1/2014  | 1                |
[...
Identical record for every minute through to
    ...]
| 15:03 1/1/2014  | 1                |
| 15:04 1/1/2014  | 1                |
| 15:05 1/1/2014  | 0                |
| 15:06 1/1/2014  | 0                |
| 15:07 1/1/2014  | 0                |
+-----------------+------------------+

（只要我以后可以解析时间戳的格式并不重要）

在我的脑海中，我可以看到它有点像这样工作（伪代码）

while time is between report_start_date and report_end_date:
    records = count(
        activations where number of movements(
            movement.actual < time
        ) > 0 //Number of movements created before current minute
            and
        movement.calling_id = calling_points(
            actual < minute
        ).last.id does not exist //As of this minute doesn't have a movement for last calling point
            and
        activations.activated > now - 24 hours //Was activated less than 24 hours ago
    )
    result timestamp, records
    time + 1 minute

我几乎将记录= count（）位排序，它只是以太循环或按时间分组我不确定。我可以按照第一个移动记录的日期进行分组，但记录只会在第一分钟显示。我希望它能为它活跃的每一分钟展示。

奖励积分

我实际上是尝试在SQLAlchemy中实现它（因此标记），我试图在将其移植到SQLAlchemy查询之前尝试使用SQL的基础知识，但是如果你可以在SQL中执行它< em>和 SQLAlchemy / Python你会得到一些东西，我还不确定它是什么，它可能是假设的。

在真正了解这些内容的任何人批评我之前，激活并不会确认列车会运行，但它足够接近我当前的目的。我的最终查询将排除取消和内容，但我只是想先了解基础知识。

Answer 1

为了为每一分钟生成一些结果，我不会依赖于每个可能的分钟都是数据库中某个表中的值的事实。出于这个原因，我实际上会在数据库中创建一个“静态”表，它只存储那些时间戳，我们将从那里开始构建一个查询。我做了以下事情：

CREATE TABLE "static_time" (
    "yyyymmddhhmm" datetime NOT NULL,
    PRIMARY KEY ("yyyymmddhhmm")
);

注意：对于我使用sqlite数据库的所有测试，您可能需要在某些地方更改以使用相应的mysql构造。

我还添加了为期2天的所有数据进行测试。您可能应该执行相同的操作，从您希望运行第一个分析到将来的某个重要年份（例如：2050-12-31T23:59:00）。我使用sqlalchemy做到了这一点，但我确信使用某个函数或循环直接执行此操作是有意义的：

class StaticTime(Base):
    __tablename__ = 'static_time'
    __table_args__ = ({'autoload': True, },)

# ...

def populate_static_time():
    print "Adding static times"
    sdt = datetime(2014, 1, 1)
    edt = sdt + timedelta(days=2)
    cdt = sdt
    while cdt <= edt:
        session.add(StaticTime(yyyymmddhhmm = cdt))
        cdt += timedelta(minutes=1)
    session.commit()
populate_static_time()

此外，我假设你的SA模型包括如下定义的关系：

# MODEL
class Schedule(Base):
    __tablename__ = 'schedules'
    __table_args__ = ({'autoload': True, },)


class Calling(Base):
    __tablename__ = 'calling'
    __table_args__ = ({'autoload': True, },)


class Activation(Base):
    __tablename__ = 'activations'
    __table_args__ = ({'autoload': True, },)

    # relationships:
    schedule = relationship("Schedule")


class Movement(Base):
    __tablename__ = 'movement'
    __table_args__ = ({'autoload': True, },)

    # relationships:
    # @note: use activation_rel as activation is column name
    activation_rel = relationship("Activation", backref="movements")

现在，让我们构建查询：

# 0. start with all times and proper counting (columns in SELECT)
q = session.query(
        StaticTime.yyyymmddhhmm.label("yyyymmddhhmm"),
        func.count(Activation.id.distinct()).label("count"),
    )

# 1. join on the trains which are active (or finished, which will be excluded later)
q = q.filter(Activation.movements.any(Movement.actual < StaticTime.yyyymmddhhmm))

# 2. join on the trains which are not finished (or no rows for those that did not)
# 2.a) subquery to get the "last" calling per sid
last_calling_sqry = (session.query(
        Calling.sid.label("sid"),
        func.max(Calling.id).label("max_calling_id"),
    )
    .group_by(Calling.sid)
).subquery("xxx")

# 2.b) subquery to find the movement for the "last" colling
train_done_at_sqry = (session.query(
        Activation.id.label("activation_id"),
        Movement.actual.label("arrived_time"),
    )
    .join(last_calling_sqry, Activation.sid == last_calling_sqry.c.sid)
    .join(Movement, and_(
            Movement.calling_id == last_calling_sqry.c.max_calling_id,
            Movement.activation == Activation.id,
        ))
).subquery("yyy")

# 2.c) lets use it now
q = q.outerjoin(train_done_at_sqry,
        train_done_at_sqry.c.activation_id == Activation.id,
    )
# 2.d) only those that arrived "after" currently tested time
q = q.filter(train_done_at_sqry.c.arrived_time >= StaticTime.yyyymmddhhmm)


# 3. add filter to use only those trains that started in last 24 hours
# @note: do not need this in case when step-X is used as well as it filters
# @TODO: replace func.date(...) with MYSQL version
q = q.filter(Activation.activated >= func.date("now", "-1 days"))

# 4. filter and group by
q = q.group_by(StaticTime.yyyymmddhhmm)
q = q.order_by(StaticTime.yyyymmddhhmm)

# @NOTE: at this point "q" will return only those minutes which have at least 1 active train

# X. FINALLY: WRAP AGAIN TO HAVE ALL MINUTES (also those with no active trains)
sub = q.subquery("sub")
w = session.query(
        StaticTime.yyyymmddhhmm.label("Timestamp"),
        func.ifnull(sub.c.count, 0).label("Active Services")
        )
w = w.outerjoin(sub, sub.c.yyyymmddhhmm == StaticTime.yyyymmddhhmm)
# @TODO: replace func.date(...) with MYSQL version
w = w.filter(Activation.activated >= func.date("now", "-1 days"))

for a in w:
    print a

这是一个相当复杂的查询，只给出您提供的数据，很难测试不同的场景。但希望您能够与当前结果进行比较，代码将为您提供有关如何完成此操作的一些提示。另外，我可能在某些地方加入了错误的列（actual vs planned）。再次，这可能不适用于mysql（我没有它，也不太了解它。）

奖励（颠倒）： w sqlite查询生成的SQL语句。您可能会发现从原始SQL开始更容易，并逐渐转向sqlalchemy。

SELECT static_time.yyyymmddhhmm AS "Timestamp", ifnull(sub.count, ?) AS "Active Services"
FROM static_time 
LEFT OUTER JOIN (
    SELECT static_time.yyyymmddhhmm AS yyyymmddhhmm, count(DISTINCT activations.id) AS count
    FROM activations, static_time 
    LEFT OUTER JOIN (
        SELECT activations.id AS activation_id, movement.actual AS arrived_time
        FROM activations 
        JOIN (
            SELECT calling.sid AS sid, max(calling.id) AS max_calling_id
            FROM calling
            GROUP BY calling.sid
            ) AS xxx 
            ON activations.sid = xxx.sid 
        JOIN movement 
            ON movement.calling_id = xxx.max_calling_id AND movement.activation = activations.id
        ) AS yyy 
        ON yyy.activation_id = activations.id
    WHERE (EXISTS (SELECT 1
        FROM movement
        WHERE activations.id = movement.activation AND movement.actual < static_time.yyyymmddhhmm)
        )
    AND yyy.arrived_time >= static_time.yyyymmddhhmm 
    GROUP BY static_time.yyyymmddhhmm 
    ORDER BY static_time.yyyymmddhhmm
    ) AS sub 
        ON sub.yyyymmddhhmm = static_time.yyyymmddhhmm
WHERE static_time.yyyymmddhhmm >= ? AND static_time.yyyymmddhhmm <= ?

PARAMS: (0, '2014-01-01 14:15:00.000000', '2014-01-01 15:10:00.000000')

从具有日期范围和多个联接的查询生成时间列表

奖励积分

1 个答案: