在一次采访中有人问我这个问题。行程表包含以下各列(customer_id,start_from,end_at,start_at_time,end_at_time),其数据结构合理,因此每次行程都存储为单独的行,并且表的一部分如下所示:您会找到昨天从A点开始到昨天P点结束的所有客户的名单吗?
我提供了使用窗口函数的解决方案,该函数确定了从A开始的所有客户的列表,然后将这些客户的列表与以P结束一天的客户进行内部联接(使用相同的窗口函数)
我给出的解决方案是这样:
SELECT a.customer_id
FROM
(SELECT a.customer_id
FROM
(SELECT customer_id,
start_from,
row_number() OVER (PARTITION BY customer_id
ORDER BY start_at_time ASC) AS rnk
FROM trips
WHERE to_date(start_at_time)= date_sub(CURRENT_DATE, 1) ) AS a
WHERE a.rnk=1
AND a.start_from='A' ) AS a
INNER JOIN
(SELECT a.customer_id
FROM
(SELECT customer_id,
end_at,
row_number() OVER (PARTITION BY customer_id
ORDER BY end_at_time DESC) AS rnk
FROM trips
WHERE to_date(end_at_time)= date_sub(CURRENT_DATE, 1) ) AS a
WHERE a.rnk=1
AND a.end_at='P' ) AS b ON a.customer_id=b.customer_id
我的面试官说我的解决方案是正确的,但是有一种更有效的方法来解决此问题。我一直在寻找并尝试找到一种更有效的方法,但到目前为止我还找不到。您能建议一种更有效的方法来解决此问题吗?
答案 0 :(得分:1)
我可能为此使用first_value()
select t.customer_id
from (select t.*,
first_value(start_from) over (partition by customer_id order by start_at_time) as first_start,
first_value(end_at) over (partition by customer_id order by start_at_time desc) as last_end
from t
where start_at_time >= date_sub(CURRENT_DATE, 1) and
start_at_time < CURRENT_DATE
) t
where first_start = start_from and -- just some filtering so select distinct is not needed
first_start = 'A' and
last_end = 'P';
我应该补充一点,许多数据库都支持等效的聚合功能,而我将使用它。
这假定开始不重复。为了安全起见,您可以添加select distinct
,但这会带来性能上的损失。
答案 1 :(得分:0)
我可能会做的广义版本:
SELECT fandl.a
FROM (
SELECT a, MIN(start) AS t0, MAX(start) AS tN
FROM someTable
WHERE start >= DATE_SUB(CURRENT_DATE, 1) AND start < CURRENT_DATE
GROUP BY a
) AS fandl
INNER JOIN someTable AS st0 ON fandl.a = st0.a AND fandl.t0 = st0.start
INNER JOIN someTable AS stN ON fandl.a = stN.a AND fandl.tN = stN.start
WHERE st0.b1 = 'A' AND stN.b2 = 'P'
;
使用您执行的日期函数,因为您没有指定sql方言。
请注意,在许多RDBMS中,如果有一个(a,start)索引,则可以仅使用索引来完成子查询和联接。最终的WHERE评估仅需要实际的表访问权限。