假设我有两个表:sales
和page_views
。
我想看看用户在购买产品之前查看的最后n页是什么。在Vertica中执行此操作的查询是什么?
销售表:
|sale_id| date | user_id | promotion_id |
-----------------------------------------------
| 1 | 2018-05-01 | A | 1 |
| 2 | 2018-05-01 | B | 2 |
| 3 | 2018-05-01 | C | 1 |
| 4 | 2018-05-01 | D | 2 |
page_views表:
| page_id | date | user_id |
----------------------------------
| 1 | 2018-04-30 | A |
| 3 | 2018-04-29 | A |
| 1 | 2018-04-28 | A |
| 1 | 2018-04-30 | B |
| 2 | 2018-04-29 | B |
| 1 | 2018-04-30 | C |
| 1 | 2018-04-30 | D |
| 2 | 2018-04-29 | D |
输出表:
| sale_id | promotion_id | page_id-1 | page_id-2 | page_id-3 |
--------------------------------------------------------------
| 1 | 1 | 1 | 3 | 1 |
| 2 | 2 | 1 | 1 | 0 |
| 3 | 1 | 1 | 0 | 0 |
| 4 | 2 | 1 | 2 | 0 |
在这种情况下,如果少于n次交互,则用虚拟值替换id(可以是0或-1)
答案 0 :(得分:0)
您可以使用union all
合并这两个表格。然后根据每行后的销售ID 分配一个组。然后枚举每个组中的值和pivot:
with tp as (
select user_id, sales_id, promotion_id, date, null as page_id
from sales
union all
select user_id, null, null, date, page_id
from page_views
),
tp2 as (
select user_id,
coalesce(sales_id,
first_value(sales_id ignore nulls) over (partition by user_id order by date desc)
) as sales_id,
coalesce(promotion_id,
first_value(sales_id ignore nulls) over (partition by user_id order by date desc)
) as promotion_id,
date, page_id
from tp2
),
tp3 as (
select row_number() over (partition by user_id, sales_id) order by desc desc) as seqnum,
tp2.*
from tp2
)
select user_id, sales_id, promotion_id,
max(case when seqnum = 2 then page_id end) as page_1,
max(case when seqnum = 3 then page_id end) as page_2,
max(case when seqnum = 4 then page_id end) as page_3
from tp3;
group by user_
id,sales_id,promotion_id;
答案 1 :(得分:0)
我无法抗拒 - “如果你有一把锤子,你的整个世界都是钉子......”
您的查询是指由一系列事件组成的模式:销售事件前面有一个或多个网页浏览事件。
所以我:
a)从sales
和page_views
b)将Vertica的MATCH()子句应用于该UNION SELECT - 获取match_id
和pattern_id
- 以找到我之后的模式......
c)最后,正如戈登·林诺夫(Gordon Linoff)所做的那样,将BY-user_id组合起来。
-- create the two input tables as temporary input, so you can play if you like ...
CREATE LOCAL TEMPORARY TABLE
sales(sale_id,date,user_id,promotion_id)
ON COMMIT PRESERVE ROWS AS (
SELECT 1,DATE '2018-05-01','A',1
UNION ALL SELECT 2,DATE '2018-05-01','B',2
UNION ALL SELECT 3,DATE '2018-05-01','C',1
UNION ALL SELECT 4,DATE '2018-05-01','D',2
)
;
CREATE LOCAL TEMPORARY TABLE
page_views(page_id,date,user_id)
ON COMMIT PRESERVE ROWS AS (
SELECT 1,DATE '2018-04-30','A'
UNION ALL SELECT 3,DATE '2018-04-29','A'
UNION ALL SELECT 1,DATE '2018-04-28','A'
UNION ALL SELECT 1,DATE '2018-04-30','B'
UNION ALL SELECT 2,DATE '2018-04-29','B'
UNION ALL SELECT 1,DATE '2018-04-30','C'
UNION ALL SELECT 1,DATE '2018-04-30','D'
UNION ALL SELECT 2,DATE '2018-04-29','D'
)
;
-- here's your query ...
WITH tser AS (
SELECT
sale_id
, NULL::INT AS page_id
, user_id
, promotion_id
, date
FROM sales
UNION ALL SELECT
NULL::INT AS sale_id
, page_id
, user_id
, NULL::INT AS promotion_id
, date
FROM page_views
ORDER BY
user_id
, date
)
,
w_pattern AS (
SELECT
NVL(sale_id,page_id) AS ev_id
, user_id
, promotion_id
, date
, event_name()
, pattern_id()
, match_id()
FROM tser
MATCH(
PARTITION BY user_id
ORDER BY date DESC
DEFINE
sale AS (sale_id IS NOT NULL)
, pgview AS (page_id IS NOT NULL)
PATTERN p AS (sale pgview+)
ROWS MATCH FIRST EVENT
)
)
SELECT
MAX(CASE match_id WHEN 1 THEN ev_id END) AS sale_id
, MAX(CASE match_id WHEN 1 THEN promotion_id END) AS promotion_id
, MAX(CASE match_id WHEN 2 THEN ev_id END) AS page_id_1
, MAX(CASE match_id WHEN 3 THEN ev_id END) AS page_id_2
, MAX(CASE match_id WHEN 4 THEN ev_id END) AS page_id_3
FROM w_pattern
GROUP BY
user_id
, pattern_id
ORDER BY 1
sale_id|promotion_id|page_id_1|page_id_2|page_id_3
1| 1| 1| 3| 1
2| 2| 1| 2|-
3| 1| 1|- |-
4| 2| 1| 2|-
开心玩.... 马可