我正在使用bigquery中的Google Analytics分析归因数据。为了对不同的归因模型进行硬编码,对于每个交易,我首先要通过访问者将该交易归因于该不同的visitor_id对网站的每次访问。
为此,我想将事务id复制到该用户数据的所有先前行(行由visitor_id和visit_number排序)。
例如,我可能有一个这样的表:
| Visitor_ID | Visit_Number | Transaction_ID |
----------------------------------------------
| A | 1 | null |
| A | 2 | null |
| A | 3 | F1245 |
我想最终得到如下表格:
| Visitor_ID | Visit_Number | Transaction_ID |
----------------------------------------------
| A | 1 | F1245 |
| A | 2 | F1245 |
| A | 3 | F1245 |
但是,如果我有一个如下表格:
| Visitor_ID | Visit_Number | Transaction_ID |
----------------------------------------------
| B | 1 | null |
| B | 2 | null |
| B | 3 | G1245 |
| B | 4 | null |
我想最终得到一张表格,其中只有交易之前的访问才能获得该交易的信用:
| Visitor_ID | Visit_Number | Transaction_ID |
----------------------------------------------
| B | 1 | G1245 |
| B | 2 | G1245 |
| B | 3 | G1245 |
| B | 4 | null |
使用SQL查询有没有办法做到这一点?
答案 0 :(得分:2)
使用window子句尝试MAX
。这是一个例子:
#standardSQL
WITH Input AS (
SELECT 'A' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'A' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'A' AS Visitor_ID, 3 AS Visit_Number, 'F1245' AS Transaction_ID UNION ALL
SELECT 'B' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'B' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'B' AS Visitor_ID, 3 AS Visit_Number, 'G1245' AS Transaction_ID UNION ALL
SELECT 'B' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID
)
SELECT
* EXCEPT (Transaction_ID),
MAX(Transaction_ID) OVER (
PARTITION BY Visitor_ID ORDER BY Visitor_ID DESC
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS Transaction_ID
FROM Input
ORDER BY Visitor_ID, Visit_Number ASC;
答案 1 :(得分:0)
在下面尝试BigQuery Standard SQL
此版本涵盖了同一访客几乎没有交易的情况 - 因此他们分配到各自的visit_numbers
#standardSQL
WITH Input AS (
SELECT 'A' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'A' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'A' AS Visitor_ID, 3 AS Visit_Number, 'F1245' AS Transaction_ID UNION ALL
SELECT 'A' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'A' AS Visitor_ID, 5 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'A' AS Visitor_ID, 6 AS Visit_Number, 'F1246' AS Transaction_ID UNION ALL
SELECT 'A' AS Visitor_ID, 7 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'B' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'B' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL
SELECT 'B' AS Visitor_ID, 3 AS Visit_Number, 'G1245' AS Transaction_ID UNION ALL
SELECT 'B' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID
)
SELECT
Visitor_ID,
Visit_Number,
Transaction_ID AS originalTransaction_ID,
SUBSTR(MIN(CONCAT(CAST(1000000 + Visit_Number AS STRING), Transaction_ID)) OVER(win), 7) AS Transaction_ID
FROM Input
WINDOW win AS (PARTITION BY Visitor_ID ORDER BY Visit_Number DESC ROWS UNBOUNDED PRECEDING)
ORDER BY Visitor_ID, Visit_Number
结果如下
Visitor_ID Visit_Number originalTransaction_ID Transaction_ID
A 1 null 3F1245
A 2 null 3F1245
A 3 F1245 3F1245
A 4 null 6F1246
A 5 null 6F1246
A 6 F1246 6F1246
A 7 null null
B 1 null 3G1245
B 2 null 3G1245
B 3 G1245 3G1245
B 4 null null