使用NTH_VALUE

时间:2018-10-23 09:34:35

标签: sql google-bigquery

我有一个pageviews表,其中包含每个会话中客户访问的页面的路径以及时间戳。列之一是landing_page,该列填充了表格中的每一行(每个会话的综合浏览量都具有相同的landing_page)。

我正在尝试为second_pagethird_pagefourth_page创建类似的列,这些列将显示在会话中访问的第二,第三和第四页的路径。我可以使用NTH_VALUE来执行此操作,但是我想处理一种特定情况,即客户多次访问同一页面。

例如,假设客户按以下顺序访问页面-

  1. www.dummywebsite.com /
  2. www.dummywebsite.com/products
  3. www.dummywebsite.com/products
  4. www.dummywebsite.com/products/prodA
  5. www.dummywebsite.com/cart

通过下面的查询,我得到了second_page =“ www.dummywebsite.com/products”和third_page =“ www.dummywebsite.com/products”。我想要的是third_page改为“ www.dummywebsite.com/products/prodA”。

如何编辑以下查询以获得所需的结果?

SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
       NTH_VALUE(path,2 ignore nulls) OVER(win) second_page_path,
       NTH_VALUE(path,3 ignore nulls) OVER(win) third_page_path,
       NTH_VALUE(path,4 ignore nulls) OVER(win) fourth_page_path
   FROM pageviews
WINDOW win AS (PARTITION BY user_id, session_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

3 个答案:

答案 0 :(得分:2)

以下内容适用于BigQuery标准SQL,对原始查询的更改最少

#standardSQL
SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
  NTH_VALUE(distinct_path,2 IGNORE NULLS) OVER(win) second_page_path,
  NTH_VALUE(distinct_path,3 IGNORE NULLS) OVER(win) third_page_path,
  NTH_VALUE(distinct_path,4 IGNORE NULLS) OVER(win) fourth_page_path
FROM (
  SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
    IF(path = LAG(path) OVER(PARTITION BY user_id, session_id ORDER BY created_at), NULL, path) distinct_path 
  FROM `project.dataset.pageviews`
)
WINDOW win AS (PARTITION BY user_id, session_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

答案 1 :(得分:1)

如果您放一些示例数据,但是沿着这些行的内容应该删除与先前路径重复的路径,会更容易

WITH CTE AS
(SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
if(path=lag(path) OVER (PARTITION BY session_id, user_id ORDER BY created_at),FALSE,TRUE) distinctPath
FROM pageviews)

SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
   NTH_VALUE(path,2 ignore nulls) OVER(win) second_page_path,
   NTH_VALUE(path,3 ignore nulls) OVER(win) third_page_path,
   NTH_VALUE(path,4 ignore nulls) OVER(win) fourth_page_path
FROM pageviews
WHERE distinctPath = TRUE
WINDOW win AS (PARTITION BY user_id, session_id ORDER BY created_at ROWS BETWEEN 
 UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

答案 2 :(得分:0)

以下查询将更改third_page_path

SELECT pageview_id, session_id, user_id, created_at, landing_page, path,
           NTH_VALUE(path,2 ignore nulls) OVER(win) second_page_path,
           NTH_VALUE(path,4 ignore nulls) OVER(win) third_page_path,
           NTH_VALUE(path,4 ignore nulls) OVER(win) fourth_page_path
       FROM pageviews
    WINDOW win AS (PARTITION BY user_id, session_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)