使用标准sql将行转换为BigQuery中的列

时间:2017-08-01 11:01:13

标签: google-bigquery transpose standard-sql

早上好,

我试图在大查询中转置一些数据。我已经看过其他一些在stackoverflow上问过这个的人,但是这样做的方法似乎是使用遗留sql(使用group_concat_unquoted)而不是标准的sql。我会使用遗产,但我过去曾遇到过嵌套数据的问题,所以自那以后只使用标准。

以下是我的例子,为了给出一些背景信息,我试图绘制下面的一些客户旅程:

uniqueid | page_flag | order_of_pages
A        | Collection|   1
A        | Product   |   2
A        | Product   |   3
A        | Login     |   4
A        | Delivery  |   5
B        | Clearance |   1
B        | Search    |   2
B        | Product   |   3
C        | Search    |   1
C        | Collection|   2
C        | Product   |   3

但是,我希望转置数据,使其如下所示:

uniqueid | 1          | 2          | 3       | 4     | 5 
A        | Collection | Product    | Product | Login | Delivery
B        | Clearance  | Search     | Product | NULL  | NULL
C        | Search     | Collection | Product | NULL  | NULL

我尝试过使用多个左连接,但收到以下错误:

select a.uniqueid, 
b.page_flag as page1,
c.page_flag as page2,
d.page_flag as page3,
e.page_flag as page4,
f.page_flag as page5

from

(select distinct uniqueid, 
(case when uniqueid is not null then 1 end) as page_hit1,
(case when uniqueid is not null then 2 end) as page_hit2,
(case when uniqueid is not null then 3 end) as page_hit3,
(case when uniqueid is not null then 4 end) as page_hit4,
(case when uniqueid is not null then 5 end) as page_hit5
from `mytable`) a

LEFT JOIN (
SELECT *
from `mytable`) b on a.uniqueid = b.uniqueid
and a.page_hit1 = b.order_of_pages


LEFT JOIN (
SELECT *
from `mytable`) c on a.uniqueid = c.uniqueid
and a.page_hit2 = c.order_of_pages


LEFT JOIN (
SELECT *
from `mytable`) d on a.uniqueid = d.uniqueid
and a.page_hit3 = d.order_of_pages


LEFT JOIN (
SELECT *
from `mytable`) e on a.uniqueid = e.uniqueid
and a.page_hit4 = e.order_of_pages


LEFT JOIN (
SELECT *
from `mytable`) f on a.uniqueid = f.uniqueid
and a.page_hit5 = f.order_of_pages



Error: Query exceeded resource limits for tier 1. Tier 13 or higher required.

我已经看过使用过数组功能,但我之前从未使用过这个功能,而且我不确定这是否只是为了转换相反的方式。任何建议都会很棒。

谢谢

1 个答案:

答案 0 :(得分:3)

for BigQuery Standard SQL

   
#standardSQL
SELECT 
  uniqueid,
  MAX(IF(order_of_pages = 1, page_flag, NULL)) AS p1,
  MAX(IF(order_of_pages = 2, page_flag, NULL)) AS p2,
  MAX(IF(order_of_pages = 3, page_flag, NULL)) AS p3,
  MAX(IF(order_of_pages = 4, page_flag, NULL)) AS p4,
  MAX(IF(order_of_pages = 5, page_flag, NULL)) AS p5
FROM `mytable`
GROUP BY uniqueid 

您可以使用问题中的以下虚拟数据进行/测试

#standardSQL
WITH `mytable` AS (
  SELECT 'A' AS uniqueid, 'Collection' AS page_flag, 1 AS order_of_pages UNION ALL
  SELECT 'A', 'Product', 2 UNION ALL
  SELECT 'A', 'Product', 3 UNION ALL
  SELECT 'A', 'Login', 4 UNION ALL
  SELECT 'A', 'Delivery', 5 UNION ALL
  SELECT 'B', 'Clearance', 1 UNION ALL
  SELECT 'B', 'Search', 2 UNION ALL
  SELECT 'B', 'Product', 3 UNION ALL
  SELECT 'C', 'Search', 1 UNION ALL
  SELECT 'C', 'Collection', 2 UNION ALL
  SELECT 'C', 'Product', 3 
)
SELECT 
  uniqueid,
  MAX(IF(order_of_pages = 1, page_flag, NULL)) AS p1,
  MAX(IF(order_of_pages = 2, page_flag, NULL)) AS p2,
  MAX(IF(order_of_pages = 3, page_flag, NULL)) AS p3,
  MAX(IF(order_of_pages = 4, page_flag, NULL)) AS p4,
  MAX(IF(order_of_pages = 5, page_flag, NULL)) AS p5
FROM `mytable`
GROUP BY uniqueid 
ORDER BY uniqueid   

结果是

uniqueid    p1          p2          p3      p4      p5   
A           Collection  Product     Product Login   Delivery     
B           Clearance   Search      Product null    null     
C           Search      Collection  Product null    null

取决于您的需求,您也可以考虑以下方法(尽管不是透视)

#standardSQL
SELECT uniqueid,
   STRING_AGG(page_flag, '>' ORDER BY order_of_pages) AS journey
FROM `mytable`
GROUP BY uniqueid
ORDER BY uniqueid   

如果使用与上述相同的虚拟数据运行 - 结果为

uniqueid    journey  
A           Collection>Product>Product>Login>Delivery    
B           Clearance>Search>Product     
C           Search>Collection>Product