使用sql将transaction_id复制到具有相同user_id的所有先前带时间戳的行?

时间:2017-04-25 17:22:17

标签: sql google-bigquery

我正在使用bigquery中的Google Analytics分析归因数据。为了对不同的归因模型进行硬编码,对于每个交易,我首先要通过访问者将该交易归因于该不同的visitor_id对网站的每次访问。

为此,我想将事务id复制到该用户数据的所有先前行(行由visitor_id和visit_number排序)。

例如,我可能有一个这样的表:

| Visitor_ID | Visit_Number | Transaction_ID |
----------------------------------------------
|     A      |       1      |       null     |
|     A      |       2      |       null     |
|     A      |       3      |       F1245    |

我想最终得到如下表格:

 | Visitor_ID | Visit_Number | Transaction_ID |
----------------------------------------------
|     A      |       1      |       F1245    |
|     A      |       2      |       F1245    |
|     A      |       3      |       F1245    |

但是,如果我有一个如下表格:

| Visitor_ID | Visit_Number | Transaction_ID |
----------------------------------------------
|     B      |       1      |       null     |
|     B      |       2      |       null     |
|     B      |       3      |       G1245    |
|     B      |       4      |       null     |

我想最终得到一张表格,其中只有交易之前的访问才能获得该交易的信用:

| Visitor_ID | Visit_Number | Transaction_ID |
----------------------------------------------
|     B      |       1      |       G1245    |
|     B      |       2      |       G1245    |
|     B      |       3      |       G1245    |
|     B      |       4      |       null     |

使用SQL查询有没有办法做到这一点?

2 个答案:

答案 0 :(得分:2)

使用window子句尝试MAX。这是一个例子:

#standardSQL
WITH Input AS (
  SELECT 'A' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'A' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'A' AS Visitor_ID, 3 AS Visit_Number, 'F1245' AS Transaction_ID UNION ALL
  SELECT 'B' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'B' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'B' AS Visitor_ID, 3 AS Visit_Number, 'G1245' AS Transaction_ID UNION ALL
  SELECT 'B' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID
)
SELECT
  * EXCEPT (Transaction_ID),
  MAX(Transaction_ID) OVER (
    PARTITION BY Visitor_ID ORDER BY Visitor_ID DESC
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) AS Transaction_ID
FROM Input
ORDER BY Visitor_ID, Visit_Number ASC;

答案 1 :(得分:0)

在下面尝试BigQuery Standard SQL

此版本涵盖了同一访客几乎没有交易的情况 - 因此他们分配到各自的visit_numbers

#standardSQL
WITH Input AS (
  SELECT 'A' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'A' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'A' AS Visitor_ID, 3 AS Visit_Number, 'F1245' AS Transaction_ID UNION ALL  
  SELECT 'A' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'A' AS Visitor_ID, 5 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'A' AS Visitor_ID, 6 AS Visit_Number, 'F1246' AS Transaction_ID UNION ALL  
  SELECT 'A' AS Visitor_ID, 7 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'B' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'B' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL
  SELECT 'B' AS Visitor_ID, 3 AS Visit_Number, 'G1245' AS Transaction_ID UNION ALL  
  SELECT 'B' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID
)
SELECT
  Visitor_ID, 
  Visit_Number, 
  Transaction_ID AS originalTransaction_ID, 
  SUBSTR(MIN(CONCAT(CAST(1000000 + Visit_Number AS STRING), Transaction_ID)) OVER(win), 7) AS Transaction_ID
FROM Input
WINDOW win AS (PARTITION BY Visitor_ID ORDER BY Visit_Number DESC ROWS UNBOUNDED PRECEDING)
ORDER BY Visitor_ID, Visit_Number  

结果如下

Visitor_ID  Visit_Number    originalTransaction_ID  Transaction_ID   
A           1               null                    3F1245   
A           2               null                    3F1245   
A           3               F1245                   3F1245   
A           4               null                    6F1246   
A           5               null                    6F1246   
A           6               F1246                   6F1246   
A           7               null                    null     
B           1               null                    3G1245   
B           2               null                    3G1245   
B           3               G1245                   3G1245   
B           4               null                    null