在BigQuery中按值获取记录类型中的最后匹配值

时间:2018-03-29 14:24:30

标签: sql google-bigquery

我在BigQuery中有一个如下所示的数据结构:

[{
    sessionID: '123456',
    revenue: 100.00,
    pagesViewed: [
      {hit: 1, val: "a.html"}, {hit:3, val: "b.html"}, {hit:3, val: "c.html?test=AAC"}, {hit:10, val:"d.html?test=CCC"}
    ]
},
{
    sessionID: '5555',
    revenue: 50.00,
    pagesViewed: [
      {hit: 1, val: "a.html"}, {hit:3, val: "b.html?test=123"}, {hit:9, val: "c.html"}, {hit:14, val:"d.html"}
    ]
}]

我正在尝试获取每个会话的最后一个测试ID。对于会话A,最后一个测试ID将等于:CCC。对于会话B,它应该等于123.从那里我试图通过最终测试值获得收入总和

我尝试的查询是:

SELECT
  REGEXP_EXTRACT(mnt,r'\?test\=([^&]*)') as TestId,
  SUM(rev) as Revenue
FROM (
  SELECT
    sessionID,
    MAX(CONCAT(CAST(pagesViewed.hit AS string),pagePagesViewed.val)) AS mnt,
    MAX(revenue) AS rev
  FROM
    `table` AS m,
    UNNEST(m.pagesViewed) AS pagesViewed
  WHERE
    pagesViewed.val LIKE "%test=%"
  GROUP BY
    1
  ORDER BY
    1,
    2 ASC)
GROUP BY
  1
ORDER BY
  2 DESC

但是,输出与上面的预期值不匹配。任何帮助将不胜感激!

输出:

Row TestId  Revenue  
1   AAC     100.0    
2   123     50.0    

预期

Row TestId  Revenue  
1   CCC     100.0    
2   123     50.0    

1 个答案:

答案 0 :(得分:1)

这应该适用于您的目的:

WITH `project.dataset.table` AS (
  SELECT '123456' AS sessionId, 100.00 AS revenue, ARRAY<STRUCT<hit INT64, val STRING>>[(1, 'a.html'), (2, 'b.html'), (3, 'c.html?test=AAC'), (4, 'd.html?test=CCC')] AS pagesViewed UNION ALL
  SELECT '5555', 50.00, ARRAY<STRUCT<hit INT64, val STRING>>[(1, 'a.html'), (2, 'b.html?test=123'), (3, 'c.html'), (4, 'd.html')]
)
SELECT
  (SELECT
     ARRAY_AGG(
       REGEXP_EXTRACT(pageViewed.val,r'\?test\=([^&]*)')
       IGNORE NULLS ORDER BY pageViewed.hit DESC LIMIT 1)[OFFSET(0)]
   FROM UNNEST(pagesViewed) AS pageViewed
  ) AS TestId,
  SUM(revenue) AS Revenue
FROM `project.dataset.table`
GROUP BY 1
ORDER BY 2 DESC;

它返回最后一个匹配的&#39;测试&#39;数组中的值。您可以尝试样本数据:

CCC 100.0

这会在一行中提供123 50.0,在另一行提供python