Question

我有一个场景，我希望是否有一种更有效的方法来优化代码，我们走了。

假设有一个名为ticket_thread的表格，其中包含以下字段

线程ID
ticketID
threadType - 可以是c2s，s2s，s2c
postTime - datetime
消息

所有数据按ticketID排序，后跟postTime

我的工作是确定每个c2s到s2c所需的时间，即响应时间。

我目前的方法是将过滤后的表转储到两个列表中--c2s和s2c

while (!isempty($c2s) || !isempty($s2c)) {

  // popping first record from c2s
  $c2sRecord = array_shift($c2s);

  if (!$c2sRecord['ticketID'] == $s2c[0]['ticketID']) {

    // cannot find a response to the ticket
    echo $c2sRecord['ticketID'] . "<br>";

  } else {

    echo $c2sRecord['ticketID'];

    // popping first response from s2c
    $s2cRecord = array_shift($s2c);

    // print out the response time
    echo " " . date_diff($s2cRecord['postTime'], $c2sRecord['postTime']);

    $filter = true;
    while ($filter) {

      // checking the next record in c2s, if it is a different ticket 
      // OR the new post is placed AFTER service has responded.
      if (($c2s[0]['ticketID'] <> $s2cRecord['ticketID']) 
          or ($c2s[0]['postTime'] > $s2cRecord['postTime'])) {

        // stops the filter
        $filter = false;

      } else {

        // pop out unneeded records (supplementary questions) 
        $c2sRecord = array_shift($c2s);

      }
    }
  }

我的问题是，这需要太长时间，有没有更快的方法我可以用SQL操作来生成我需要的东西到这样的东西？

table generated from SQL

ticket_id | c2sTime  | s2cTime  | timeTaken | rank
  0012    | 12:20:20 | 12:30:20 | 00:10:00  |   1
  0012    | 12:40:00 | 12:55:30 | 00:15:30  |   2
  0012    | 13:10:20 |   null   |   null    |   3
  0013    | 12:20:20 |   null   |   null    |   1

编辑：请求的示例表

threadID | ticketID | threadType | postTime | message
  3012   |   0012   |    c2s     | 12:20:20 | customer A's 1st post
  3014   |   0012   |    c2s     | 12:20:30 | Added info to A's 1st post, should not be included
  3015   |   0012   |    s2c     | 12:30:20 | Support responding to A's 1st post
  3016   |   0012   |    s2s     | 12:30:30 | internal chat, should not be included
  3017   |   0012   |    s2s     | 12:30:40 | internal chat, should not be included
  3018   |   0012   |    c2s     | 12:40:00 | A's 2nd post
  3019   |   0012   |    s2c     | 12:55:30 | Support responding to A's 2nd post
  3020   |   0012   |    s2c     | 13:00:00 | Added info to Support's 2nd response, should not be included
  3021   |   0012   |    c2s     | 13:10:00 | A's 3nd post
  3013   |   0013   |    c2s     | 12:20:20 | customer B's 1st post

Answer 1

如果所有window functions都支持FILTER()子句（如基于聚合的变体那样），那么您的任务可能会简单得多。即需要的只是：

-- won't work, unfortunately
first_value(post_time) filter (where thread_type = 's2c')
                         over (partition by ticket_id
                               order by post_time
                               rows between current row and unbounded following)

在此之前，您可以使用自我加入：

select  t.*, row_number() over (partition by t.ticket_id order by t.c2s_time) rank
from    (select    distinct on (coalesce(s2c.thread_id, c2s.thread_id))
                   c2s.ticket_id,
                   c2s.post_time c2s_time,
                   c2s.message c2s_message,
                   s2c.post_time s2c_time,
                   s2c.message s2c_message,
                   s2c.post_time - c2s.post_time time_taken
         from      ticket_thread c2s
         left join ticket_thread s2c on  c2s.ticket_id = s2c.ticket_id
                                     and s2c.thread_type = 's2c'
                                     and c2s.post_time < s2c.post_time
                                     and not exists(select 1
                                                    from   ticket_thread
                                                    where  post_time > c2s.post_time
                                                    and    post_time < s2c.post_time
                                                    and    ticket_id = c2s.ticket_id
                                                    and    thread_type = 's2c')
         where     c2s.thread_type = 'c2s'
         order by  coalesce(s2c.thread_id, c2s.thread_id), c2s.post_time) t
order by t.ticket_id, t.c2s_time;

或者，您可以使用array_agg()作为窗口函数：

select  t.*, row_number() over (partition by t.ticket_id order by t.c2s_time) rank
from    (select    distinct on (coalesce((m).thread_id, (t).thread_id))
                   (t).ticket_id,
                   (t).post_time c2s_time,
                   (t).message c2s_message,
                   (m).post_time s2c_time,
                   (m).message s2c_message,
                   (m).post_time - (t).post_time time_taken
         from      (select t, array_agg(t) filter (where thread_type = 's2c')
                                             over (partition by ticket_id
                                                   order by     post_time
                                                   rows between current row and unbounded following) a
                    from   ticket_thread t) t
         left join lateral  (select   m
                             from     unnest(a) m
                             order by (m).post_time
                             limit    1) m on true
         where     (t).thread_type = 'c2s'
         order by  coalesce((m).thread_id, (t).thread_id), (t).post_time) t
order by t.ticket_id, t.c2s_time;

从我的内部测试来看，似乎自连接变体更快一些。它还可以在(ticket_id, post_time)上使用索引。（但是你应该测试两者，如果性能对你很重要的话）。

或者，您也可以添加缺少的功能（即创建一个first_agg聚合＆amp;将其用作窗口函数）：

create or replace function first_agg_val(anyelement, anyelement)
  returns anyelement
  language sql
  immutable
  strict
  as 'select $1';

create aggregate first_agg(
  sfunc    = first_agg_val,
  basetype = anyelement,
  stype    = anyelement
);

select  t.*, row_number() over (partition by t.ticket_id order by t.c2s_time) rank
from    (select    distinct on (coalesce((s2c).thread_id, (c2s).thread_id))
                   (c2s).ticket_id,
                   (c2s).post_time c2s_time,
                   (c2s).message c2s_message,
                   (s2c).post_time s2c_time,
                   (s2c).message s2c_message,
                   (s2c).post_time - (c2s).post_time time_taken
         from      (select t c2s, first_agg(t) filter (where thread_type = 's2c')
                                                 over (partition by ticket_id
                                                       order by     post_time
                                                       rows between current row and unbounded following) s2c
                    from   ticket_thread t) t
         where     (c2s).thread_type = 'c2s'
         order by  coalesce((s2c).thread_id, (c2s).thread_id), (c2s).post_time) t
order by t.ticket_id, t.c2s_time;

如果您不需要rank，则可以删除外部查询（它们存在，仅适用于rank）。（这通常很容易在客户端计算，而不是。）

http://rextester.com/BUY9309

PS：我的查询“time_taken列是interval。如果您不喜欢/无法解析该列，则可以使用以下公式来获得时间差，以秒为单位：

extract(epoch from <interval expresssion>)

Answer 2

主要使用窗口函数的替代解决方案：

select ticketid, c2stime, s2ctime, s2ctime- c2stime as timetaken, rank() over w
from (
    select ticketid, threadtype, posttime as c2stime, lead(posttime) over w as s2ctime
    from (
        select *, lag(threadtype) over w
        from ticket_thread
        where threadtype <> 's2s'
        window w as (partition by ticketid order by threadid)
        ) s
    where threadtype <> coalesce(lag, '')
    window w as (partition by ticketid order by threadid)
    ) s
where threadtype = 'c2s'
window w as (partition by ticketid order by c2stime)
order by ticketid, c2stime;

 ticketid | c2stime  | s2ctime  | timetaken | rank 
----------+----------+----------+-----------+------
       12 | 12:20:20 | 12:30:20 | 00:10:00  |    1
       12 | 12:40:00 | 12:55:30 | 00:15:30  |    2
       12 | 13:10:00 |          |           |    3
       13 | 12:20:20 |          |           |    1
(4 rows)

确定帖子之间时间的有效方法？

2 个答案: