我有一个场景,我希望是否有一种更有效的方法来优化代码,我们走了。
假设有一个名为ticket_thread
的表格,其中包含以下字段
所有数据按ticketID排序,后跟postTime
我的工作是确定每个c2s到s2c所需的时间,即响应时间。
我目前的方法是将过滤后的表转储到两个列表中--c2s和s2c
while (!isempty($c2s) || !isempty($s2c)) {
// popping first record from c2s
$c2sRecord = array_shift($c2s);
if (!$c2sRecord['ticketID'] == $s2c[0]['ticketID']) {
// cannot find a response to the ticket
echo $c2sRecord['ticketID'] . "<br>";
} else {
echo $c2sRecord['ticketID'];
// popping first response from s2c
$s2cRecord = array_shift($s2c);
// print out the response time
echo " " . date_diff($s2cRecord['postTime'], $c2sRecord['postTime']);
$filter = true;
while ($filter) {
// checking the next record in c2s, if it is a different ticket
// OR the new post is placed AFTER service has responded.
if (($c2s[0]['ticketID'] <> $s2cRecord['ticketID'])
or ($c2s[0]['postTime'] > $s2cRecord['postTime'])) {
// stops the filter
$filter = false;
} else {
// pop out unneeded records (supplementary questions)
$c2sRecord = array_shift($c2s);
}
}
}
我的问题是,这需要太长时间,有没有更快的方法我可以用SQL操作来生成我需要的东西到这样的东西?
table generated from SQL
ticket_id | c2sTime | s2cTime | timeTaken | rank
0012 | 12:20:20 | 12:30:20 | 00:10:00 | 1
0012 | 12:40:00 | 12:55:30 | 00:15:30 | 2
0012 | 13:10:20 | null | null | 3
0013 | 12:20:20 | null | null | 1
编辑:请求的示例表
threadID | ticketID | threadType | postTime | message
3012 | 0012 | c2s | 12:20:20 | customer A's 1st post
3014 | 0012 | c2s | 12:20:30 | Added info to A's 1st post, should not be included
3015 | 0012 | s2c | 12:30:20 | Support responding to A's 1st post
3016 | 0012 | s2s | 12:30:30 | internal chat, should not be included
3017 | 0012 | s2s | 12:30:40 | internal chat, should not be included
3018 | 0012 | c2s | 12:40:00 | A's 2nd post
3019 | 0012 | s2c | 12:55:30 | Support responding to A's 2nd post
3020 | 0012 | s2c | 13:00:00 | Added info to Support's 2nd response, should not be included
3021 | 0012 | c2s | 13:10:00 | A's 3nd post
3013 | 0013 | c2s | 12:20:20 | customer B's 1st post
答案 0 :(得分:1)
如果所有window functions都支持FILTER()
子句(如基于聚合的变体那样),那么您的任务可能会简单得多。即 需要的只是:
-- won't work, unfortunately
first_value(post_time) filter (where thread_type = 's2c')
over (partition by ticket_id
order by post_time
rows between current row and unbounded following)
在此之前,您可以使用自我加入:
select t.*, row_number() over (partition by t.ticket_id order by t.c2s_time) rank
from (select distinct on (coalesce(s2c.thread_id, c2s.thread_id))
c2s.ticket_id,
c2s.post_time c2s_time,
c2s.message c2s_message,
s2c.post_time s2c_time,
s2c.message s2c_message,
s2c.post_time - c2s.post_time time_taken
from ticket_thread c2s
left join ticket_thread s2c on c2s.ticket_id = s2c.ticket_id
and s2c.thread_type = 's2c'
and c2s.post_time < s2c.post_time
and not exists(select 1
from ticket_thread
where post_time > c2s.post_time
and post_time < s2c.post_time
and ticket_id = c2s.ticket_id
and thread_type = 's2c')
where c2s.thread_type = 'c2s'
order by coalesce(s2c.thread_id, c2s.thread_id), c2s.post_time) t
order by t.ticket_id, t.c2s_time;
或者,您可以使用array_agg()
作为窗口函数:
select t.*, row_number() over (partition by t.ticket_id order by t.c2s_time) rank
from (select distinct on (coalesce((m).thread_id, (t).thread_id))
(t).ticket_id,
(t).post_time c2s_time,
(t).message c2s_message,
(m).post_time s2c_time,
(m).message s2c_message,
(m).post_time - (t).post_time time_taken
from (select t, array_agg(t) filter (where thread_type = 's2c')
over (partition by ticket_id
order by post_time
rows between current row and unbounded following) a
from ticket_thread t) t
left join lateral (select m
from unnest(a) m
order by (m).post_time
limit 1) m on true
where (t).thread_type = 'c2s'
order by coalesce((m).thread_id, (t).thread_id), (t).post_time) t
order by t.ticket_id, t.c2s_time;
从我的内部测试来看,似乎自连接变体更快一些。它还可以在(ticket_id, post_time)
上使用索引。 (但是你应该测试两者,如果性能对你很重要的话)。
或者,您也可以添加缺少的功能(即创建一个first_agg
聚合&amp;将其用作窗口函数):
create or replace function first_agg_val(anyelement, anyelement)
returns anyelement
language sql
immutable
strict
as 'select $1';
create aggregate first_agg(
sfunc = first_agg_val,
basetype = anyelement,
stype = anyelement
);
select t.*, row_number() over (partition by t.ticket_id order by t.c2s_time) rank
from (select distinct on (coalesce((s2c).thread_id, (c2s).thread_id))
(c2s).ticket_id,
(c2s).post_time c2s_time,
(c2s).message c2s_message,
(s2c).post_time s2c_time,
(s2c).message s2c_message,
(s2c).post_time - (c2s).post_time time_taken
from (select t c2s, first_agg(t) filter (where thread_type = 's2c')
over (partition by ticket_id
order by post_time
rows between current row and unbounded following) s2c
from ticket_thread t) t
where (c2s).thread_type = 'c2s'
order by coalesce((s2c).thread_id, (c2s).thread_id), (c2s).post_time) t
order by t.ticket_id, t.c2s_time;
如果您不需要rank
,则可以删除外部查询(它们存在,仅适用于rank
)。 (这通常很容易在客户端计算,而不是。)
PS:我的查询“time_taken
列是interval
。如果您不喜欢/无法解析该列,则可以使用以下公式来获得时间差,以秒为单位:
extract(epoch from <interval expresssion>)
答案 1 :(得分:1)
主要使用窗口函数的替代解决方案:
select ticketid, c2stime, s2ctime, s2ctime- c2stime as timetaken, rank() over w
from (
select ticketid, threadtype, posttime as c2stime, lead(posttime) over w as s2ctime
from (
select *, lag(threadtype) over w
from ticket_thread
where threadtype <> 's2s'
window w as (partition by ticketid order by threadid)
) s
where threadtype <> coalesce(lag, '')
window w as (partition by ticketid order by threadid)
) s
where threadtype = 'c2s'
window w as (partition by ticketid order by c2stime)
order by ticketid, c2stime;
ticketid | c2stime | s2ctime | timetaken | rank
----------+----------+----------+-----------+------
12 | 12:20:20 | 12:30:20 | 00:10:00 | 1
12 | 12:40:00 | 12:55:30 | 00:15:30 | 2
12 | 13:10:00 | | | 3
13 | 12:20:20 | | | 1
(4 rows)