我在BigQuery中有一个如下所示的数据结构:
[{
sessionID: '123456',
revenue: 100.00,
pagesViewed: [
{hit: 1, val: "a.html"}, {hit:3, val: "b.html"}, {hit:3, val: "c.html?test=AAC"}, {hit:10, val:"d.html?test=CCC"}
]
},
{
sessionID: '5555',
revenue: 50.00,
pagesViewed: [
{hit: 1, val: "a.html"}, {hit:3, val: "b.html?test=123"}, {hit:9, val: "c.html"}, {hit:14, val:"d.html"}
]
}]
我正在尝试获取每个会话的最后一个测试ID。对于会话A,最后一个测试ID将等于:CCC。对于会话B,它应该等于123.从那里我试图通过最终测试值获得收入总和
我尝试的查询是:
SELECT
REGEXP_EXTRACT(mnt,r'\?test\=([^&]*)') as TestId,
SUM(rev) as Revenue
FROM (
SELECT
sessionID,
MAX(CONCAT(CAST(pagesViewed.hit AS string),pagePagesViewed.val)) AS mnt,
MAX(revenue) AS rev
FROM
`table` AS m,
UNNEST(m.pagesViewed) AS pagesViewed
WHERE
pagesViewed.val LIKE "%test=%"
GROUP BY
1
ORDER BY
1,
2 ASC)
GROUP BY
1
ORDER BY
2 DESC
但是,输出与上面的预期值不匹配。任何帮助将不胜感激!
输出:
Row TestId Revenue
1 AAC 100.0
2 123 50.0
预期
Row TestId Revenue
1 CCC 100.0
2 123 50.0
答案 0 :(得分:1)
这应该适用于您的目的:
WITH `project.dataset.table` AS (
SELECT '123456' AS sessionId, 100.00 AS revenue, ARRAY<STRUCT<hit INT64, val STRING>>[(1, 'a.html'), (2, 'b.html'), (3, 'c.html?test=AAC'), (4, 'd.html?test=CCC')] AS pagesViewed UNION ALL
SELECT '5555', 50.00, ARRAY<STRUCT<hit INT64, val STRING>>[(1, 'a.html'), (2, 'b.html?test=123'), (3, 'c.html'), (4, 'd.html')]
)
SELECT
(SELECT
ARRAY_AGG(
REGEXP_EXTRACT(pageViewed.val,r'\?test\=([^&]*)')
IGNORE NULLS ORDER BY pageViewed.hit DESC LIMIT 1)[OFFSET(0)]
FROM UNNEST(pagesViewed) AS pageViewed
) AS TestId,
SUM(revenue) AS Revenue
FROM `project.dataset.table`
GROUP BY 1
ORDER BY 2 DESC;
它返回最后一个匹配的&#39;测试&#39;数组中的值。您可以尝试样本数据:
CCC 100.0
这会在一行中提供123 50.0
,在另一行提供python
。