获取最新记录或首次交易(销售)的BigQuery SQL标准

时间:2018-11-09 18:03:11

标签: sql google-bigquery

我在BigQuery上有一个表格,其中包含以下列:

user_id visit_date  referral    transaction
1234    20180101    site2       0
1234    20180102    site3       1
1234    20180103    site2       1
4567    20180104    site4       0
4567    20180105    site5       0
5678    20180101    site2       0
5768    20180102    site3       1

我的目标是使表具有以下格式:

path                transactions
site2 > site3       2 
site2               1
site4 > site5       0

我不明白的是,如何为在同一时间段内发生多次转化的用户“重置”路径,就像user_id = 1234一样。

到目前为止,我设法使用以下查询,但这不是所需的输出。

SELECT
  referral_path,
  SUM(transactions) AS transactions
FROM (
  SELECT
  user_id,
  STRING_AGG(DISTINCT(referral), ',') AS referral_path,
  MAX(transactions) AS transactions
  FROM (
     SELECT
     user_id,
     referral,
     transactions
  FROM
     table
  ORDER BY
     user_id )a
  GROUP BY
     user_id )b
  GROUP BY
     referral_path
  ORDER BY
      transactions DESC

1 个答案:

答案 0 :(得分:2)

以下是用于BigQuery标准SQL

#standardSQL
SELECT 
  path, 
  SUM(transaction) transactions
FROM (
  SELECT
    STRING_AGG(referral, ' > ') path,
    SUM(transaction) transaction
  FROM (
    SELECT 
      user_id, visit_date, referral, transaction, 
      IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp 
    FROM `project.dataset.table`
  )
  GROUP BY user_id, grp
)
GROUP BY path

您可以使用下面的问题中的虚拟数据进行测试,操作

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
  SELECT 1234, '20180102', 'site3', 1 UNION ALL
  SELECT 1234, '20180103', 'site2', 1 UNION ALL
  SELECT 4567, '20180104', 'site4', 0 UNION ALL
  SELECT 4567, '20180105', 'site5', 0 UNION ALL
  SELECT 5678, '20180101', 'site2', 0 UNION ALL
  SELECT 5678, '20180102', 'site3', 1 
)
SELECT 
  path, 
  SUM(transaction) transactions
FROM (
  SELECT
    STRING_AGG(referral, ' > ') path,
    SUM(transaction) transaction
  FROM (
    SELECT 
      user_id, visit_date, referral, transaction, 
      IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp 
    FROM `project.dataset.table`
  )
  GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC  

有结果

Row path            transactions     
1   site2 > site3   2    
2   site2           1    
3   site4 > site5   0