如何使用python比较结果集中的两个连续行值

时间:2018-09-14 11:20:39

标签: python postgresql sqlalchemy

我有一张桌子issue_logs

 id | issue_id | from_status | to_status |             up_date              |  remarks  
----+----------+-------------+-----------+----------------------------------+-----------
 29 |       20 |          10 |        11 | 2018-09-14 11:43:13.907052+05:30 | UPDATED
 28 |       20 |           9 |        10 | 2018-09-14 11:42:59.612728+05:30 | UPDATED
 27 |       20 |             |         9 | 2018-09-11 17:45:35.13891+05:30  | NEW issue
 26 |       19 |           9 |        11 | 2018-09-06 16:37:05.935588+05:30 | UPDATED
 25 |       19 |             |         9 | 2018-09-06 16:27:40.543001+05:30 | NEW issue
 24 |       18 |          11 |        10 | 2018-09-05 17:13:37.568762+05:30 | UPDATED

rt_status

 id |   description    | duration_in_min 
----+------------------+-----------------
  1 | new              |               1
  2 | working          |               1
  3 | approval pending |               1
  4 | resolved         |               1
  5 | initial check    |               1
  6 | parts purchase   |               1
  7 | shipment         |               1
  8 | close            |               1
  9 | initial check    |               1
 10 | parts purchase   |               1
 11 | shipment         |               1
 12 | close            |               1

对于日期范围from_datetime = '2018-09-06T16:34'to_datetime = '2018-09-14T12:27',我想选择所有超出duration_of_time表中定义的每个状态值的rt_status设置的问题。我应该从问题日志中获取ID为29、27和26的记录。ID为29和26的记录应考虑其最后up_dateto_datetime之间的时间。

我想使用func.lagover来做,但是我无法获得正确的记录。我正在使用Postgresql 9.6和Python 2.7。我该如何仅使用 SQLAlchemy Core 使func.lagfunc.lead正常工作?

我尝试过的事情:

    s = select([
            rt_issues.c.id.label('rtissue_id'),
            rt_issues,
            rt_status.c.duration_in_min,
            rt_status.c.id.label('stage_id'),
            issue_status_logs.c.id.label('issue_log_id'),
            issue_status_logs.c.up_date.label('iss_log_update'),
            (issue_status_logs.c.up_date - func.lag(
                    issue_status_logs.c.up_date).over(
                    issue_status_logs.c.issue_id
                    )).label('mdiff'),
            ]).\
    where(and_(*conditions)).\
    select_from(rt_issues.
    outerjoin(issue_status_logs,
              rt_issues.c.id == issue_status_logs.c.issue_id).
    outerjoin(rt_status,
              issue_status_logs.c.to_status == rt_status.c.id)).\
    order_by(asc(issue_status_logs.c.up_date),
                  issue_status_logs.c.issue_id).\
    group_by(
             issue_status_logs.c.issue_id,
             rt_issues.c.id,
             issue_status_logs.c.id
             )
    rs = g.conn.execute(s)
    mcnt =  rs.rowcount
    print mcnt, 'rowcont'
    if rs.rowcount > 0:
        for r in rs:
            print dict(r)

这会产生包含错误记录的结果,即ID为28的问题日志。有人可以帮助纠正错误吗?

2 个答案:

答案 0 :(得分:1)

尽管您自己设法解决了问题,但这是一个不使用窗口函数(即CALL MY_PROCEDURE('First_table'); CALL MY_PROCEDURE('Second_table'); lag())的问题。为了比较连续问题日志的lead()时间戳之间的差异,您可以自行加入。在SQL中,查询看起来像

up_date

与SQLAlchemy SQL Expression Language中的相同:

select    ilx.id
from      issue_logs ilx
join      rt_status rsx on rsx.id = ilx.to_status
left join issue_logs ily on  ily.from_status = ilx.to_status
                         and ily.issue_id = ilx.issue_id
where     ilx.up_date >= '2018-09-06T16:34'
and       ilx.up_date <= ( coalesce(ily.up_date, '2018-09-14T12:27') -
                           interval '1 minute' * rsx.duration_in_min );

答案 1 :(得分:0)

我的解决方案使用修改后的sqlalchemy表达式语言:

s = select([
        rt_issues.c.id.label('rtissue_id'),
        rt_issues.c.title,
        rt_status.c.duration_in_min,
        rt_status.c.is_last_status,
        rt_status.c.id.label('stage_id'),
        issue_status_logs.c.id.label('issue_log_id'),
        issue_status_logs.c.up_date.label('iss_log_update'),
        (issue_status_logs.c.up_date - func.lag(
                issue_status_logs.c.up_date).over(
                issue_status_logs.c.issue_id)).
        label('mdiff'),
        (func.lead(
                issue_status_logs.c.issue_id).over(
                issue_status_logs.c.issue_id
                )).label('next_id'),
        (func.lead(
                issue_status_logs.c.up_date).over(
                issue_status_logs.c.issue_id,
                issue_status_logs.c.up_date,
                )).label('prev_up_date'),
        issue_status_logs.c.user_id,
        (users.c.first_name + ' ' + users.c.last_name).
        label('updated_by_user'),
        ]).\
    where(and_(*conditions)).\
    select_from(rt_issues.
    outerjoin(issue_status_logs,
              rt_issues.c.id == issue_status_logs.c.issue_id).
    outerjoin(users, issue_status_logs.c.user_id == users.c.id).
    outerjoin(rt_status,
              issue_status_logs.c.to_status == rt_status.c.id)).\
    order_by(issue_status_logs.c.issue_id,
             asc(issue_status_logs.c.up_date)).\
    group_by(
             issue_status_logs.c.issue_id,
             rt_issues.c.id,
             issue_status_logs.c.id,
             rt_status.c.id,
             users.c.id
             )
rs = g.conn.execute(s)
if rs.rowcount > 0:
    for r in rs:
        # IMPT: For issue with no last status
        if not r[rt_status.c.is_last_status]:
            if not r['mdiff'] and (not r['next_id']):
                n = (mto_dt - r['iss_log_update'].replace(tzinfo=None))
            elif ((not r['mdiff']) and
                  (r['next_id'] == r['rtissue_id'])):
                n = (r['prev_up_date'] - r['iss_log_update'])
            else:
                n = (r['mdiff'])
            n =  (n.total_seconds()/60)
            if n > r[rt_status.c.duration_in_min]:
                mx = dict(r)
                q_user_wise_pendency_list.append(mx)

    for t in q_user_wise_pendency_list:
        if not t in temp_list:
            temp_list.append(t)
    q_user_wise_pendency_list = temp_list