我有一个pageviews
表,其中包含每个会话中客户访问的页面的路径以及时间戳。列之一是landing_page
,该列填充了表格中的每一行(每个会话的综合浏览量都具有相同的landing_page
)。
我正在尝试为second_page
,third_page
和fourth_page
创建类似的列,这些列将显示在会话中访问的第二,第三和第四页的路径。我可以使用NTH_VALUE来执行此操作,但是我想处理一种特定情况,即客户多次访问同一页面。
例如,假设客户按以下顺序访问页面-
通过下面的查询,我得到了second_page
=“ www.dummywebsite.com/products”和third_page
=“ www.dummywebsite.com/products”。我想要的是third_page
改为“ www.dummywebsite.com/products/prodA”。
如何编辑以下查询以获得所需的结果?
SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
NTH_VALUE(path,2 ignore nulls) OVER(win) second_page_path,
NTH_VALUE(path,3 ignore nulls) OVER(win) third_page_path,
NTH_VALUE(path,4 ignore nulls) OVER(win) fourth_page_path
FROM pageviews
WINDOW win AS (PARTITION BY user_id, session_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
答案 0 :(得分:2)
以下内容适用于BigQuery标准SQL,对原始查询的更改最少
#standardSQL
SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
NTH_VALUE(distinct_path,2 IGNORE NULLS) OVER(win) second_page_path,
NTH_VALUE(distinct_path,3 IGNORE NULLS) OVER(win) third_page_path,
NTH_VALUE(distinct_path,4 IGNORE NULLS) OVER(win) fourth_page_path
FROM (
SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
IF(path = LAG(path) OVER(PARTITION BY user_id, session_id ORDER BY created_at), NULL, path) distinct_path
FROM `project.dataset.pageviews`
)
WINDOW win AS (PARTITION BY user_id, session_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
答案 1 :(得分:1)
如果您放一些示例数据,但是沿着这些行的内容应该删除与先前路径重复的路径,会更容易
WITH CTE AS
(SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
if(path=lag(path) OVER (PARTITION BY session_id, user_id ORDER BY created_at),FALSE,TRUE) distinctPath
FROM pageviews)
SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
NTH_VALUE(path,2 ignore nulls) OVER(win) second_page_path,
NTH_VALUE(path,3 ignore nulls) OVER(win) third_page_path,
NTH_VALUE(path,4 ignore nulls) OVER(win) fourth_page_path
FROM pageviews
WHERE distinctPath = TRUE
WINDOW win AS (PARTITION BY user_id, session_id ORDER BY created_at ROWS BETWEEN
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
答案 2 :(得分:0)
以下查询将更改third_page_path
SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
NTH_VALUE(path,2 ignore nulls) OVER(win) second_page_path,
NTH_VALUE(path,4 ignore nulls) OVER(win) third_page_path,
NTH_VALUE(path,4 ignore nulls) OVER(win) fourth_page_path
FROM pageviews
WINDOW win AS (PARTITION BY user_id, session_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)