我正在使用Redshift并且需要替代相关子查询。我得到相关的子查询不支持错误。但是,对于尝试识别同一客户在原始交易的给定时间内所做的所有销售交易的特定练习,我不确定传统的左连接是否也可以。即,查询依赖于父选择的上下文或当前值。我也尝试使用row_number()窗口函数进行类似的操作,但同样需要一种在日期范围内窗口/分区的方法 - 而不仅仅是customer_id。
总体目标是找到给定客户ID的第一个销售交易,然后查找在第一个交易的60分钟内进行的所有后续交易。对于同一客户(以及最终数据库中的所有客户)的剩余交易,此逻辑将继续。也就是说,一旦从第一次交易时间开始建立了最初的60分钟窗口,第二个60分钟窗口将在第一个60分钟窗口结束时开始,第二个窗口内的所有交易也将被识别和组合然后重复剩余的交易。
输出将列出启动60分钟窗口的第一个事务ID,然后列出在60分钟窗口内创建的其他后续事务ID。第二行将显示同一客户在下一个60分钟窗口中创建的第一个交易ID(同样,第一个交易发布的第一个60分钟窗口将是第二个60分钟窗口的开始),然后后续交易也进行在第二个60分钟的窗口内。
最基本形式的查询示例如下面的查询:
select
s1.customer_id,
s1.transaction_id,
s1.order_time,
(
select
s2.transaction_id
from
sales s2
where
s2.order_time > s1.order_time and
s2.order_time <= dateadd(m,60,s1.order_time) and
s2.customer_id = s1.customer_id
order by
s2.order_time asc
limit 1
) as sales_transaction_id_1,
(
select
s3.transaction_id
from
sales s3
where
s3.order_time > s1.order_time and
s3.order_time <= dateadd(m,60,s1.order_time) and
s3.customer_id = s1.customer_id
order by
s3.order_time asc
limit 1 offset 1
) as sales_transaction_id_2,
(
select
s3.transaction_id
from
sales s4
where
s4.order_time > s1.order_time and
s4.order_time <= dateadd(m,60,s1.order_time) and
s4.customer_id = s1.customer_id
order by
s4.order_time asc
limit 1 offset 1
) as sales_transaction_id_3
from
(
select
sales.customer_id,
sales.transaction_id,
sales.order_time
from
sales
order by
sales.order_time desc
) s1;
例如,如果客户进行了以下交易:
customer_id transaction_id order_time
1234 33453 2017-06-05 13:30
1234 88472 2017-06-05 13:45
1234 88477 2017-06-05 14:10
1234 99321 2017-06-07 8:30
1234 99345 2017-06-07 8:45
预期输出为:
customer_id transaction_id sales_transaction_id_1 sales_transaction_id_2 sales_transaction_id_3
1234 33453 88472 88477 NULL
1234 99321 99345 NULL NULL
此外,看起来Redshift不支持横向连接,这似乎进一步限制了我可以使用的选项。任何帮助将不胜感激。
答案 0 :(得分:0)
根据您的说明,您只需要group by
和某种日期差异。我不确定你想如何组合行,但这是基本的想法:
select s.customer_id,
min(order_time) as first_order_in_hour,
max(order_time) as last_order_in_hour,
count(*) as num_orders
from (select s.*,
min(order_time) over (partition by customer_id) as min_ot
from sales s
) s
group by customer_id, floor(datediff(second, min_ot, order_time) / (60 * 60));
这种表述(或者类似的东西,因为Postgres没有datediff()
)在Postgres中也会快得多。
答案 1 :(得分:0)
您可以使用窗口函数来获取每个事务的后续事务。窗口将是客户/小时,您可以对记录进行排名以获得第一个“锚点”交易,并获得您需要的所有后续交易:
with
transaction_chains as (
select
customer_id
,transaction_id
,order_time
-- rank transactions within window to find the first "anchor" transaction
,row_number() over (partition by customer_id,date_trunc('minute',order_time) order by order_time)
-- 1st next order
,lead(transaction_id,1) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as transaction_id_1
,lead(order_time,1) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as order_time_1
-- 2nd next order
,lead(transaction_id,2) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as transaction_id_2
,lead(order_time,2) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as order_time_2
-- 2nd next order
,lead(transaction_id,3) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as transaction_id_3
,lead(order_time,3) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as order_time_3
from sales
)
select
customer_id
,transaction_id
,transaction_id_1
,transaction_id_2
,transaction_id_3
from transaction_chains
where row_number=1;