Bigquery中的多个重复结构

时间:2017-08-08 17:43:47

标签: google-bigquery

跟进此问题 - Bigquery multiple unnest in a single select

我们正在使用bigquery作为我们的仓储解决方案,并试图通过尝试整合来突破极限。一个简单的例子是客户端跟踪。客户产生收入,在我们的网站上有几个接触点,并独立维护我们的几个帐户。对于想要对客户进行行为分析的商业用户,他们希望跟踪访问次数,产生的收入以及他们的帐户如何影响保留,我们正在尝试评估嵌套结构是否适合我们

以下是一个例子。我有3张桌子。

客户(C)

C_Key | C_Name

- - - - | ------

1 | ABC

2 | DEF

帐户(A)

A_Key | C_Key
11 | 1
12 | 1
21 | 2
22 | 2

23 | 2

收入(R)

R_Key | C_Key |收入

------- | --------- | ----------

11 | 1 | 10 $

12 | 1 | $ 20

21 | 2 | 10 $

我使用array_agg将这三个组合成一个嵌套的表,如下所示:

{Client,
    Accounts:
          [{
            }],
    Revenue:
          [{
              }]
  }

我希望能够在单个查询中使用多个不必要的内容,如下所示

 Select client, Count Distinct(Accounts) and SUM(Revenue) from <single nested 
    table>, unnest accounts, unnest revenue

预期输出为2行,

1,2,$ 30

2,3,$ 10

但是,在同一查询中多次使用不会产生交叉连接 实际输出是

1,2,$ 60

2,3,$ 30

1 个答案:

答案 0 :(得分:0)

以下是BigQuery Standard SQL

首先让我们澄清single nested table

的创建

我希望你做了类似的事情:

   
#standardSQL
WITH clients AS (
  SELECT 1 AS c_key, 'abc' AS c_name UNION ALL
  SELECT 2, 'def'
), accounts AS (
  SELECT 11 AS a_key, 1 AS c_key UNION ALL
  SELECT 12, 1 UNION ALL
  SELECT 21, 2 UNION ALL
  SELECT 22, 2 UNION ALL
  SELECT 23, 2
), revenue AS (
  SELECT 11 AS r_key, 1 AS c_key, 10 AS revenue UNION ALL
  SELECT 12, 1, 20 UNION ALL
  SELECT 21, 2, 10
), single_nested_table AS (
  SELECT x.c_key, x.c_name, accounts, revenue 
  FROM (
    SELECT c.c_key, c_name, ARRAY_AGG(a) AS accounts --, array_agg(r) as revenue  
    FROM clients AS c
    LEFT JOIN accounts AS a ON a.c_key = c.c_key
    GROUP BY c.c_key, c_name
  ) x
  JOIN (
    SELECT c.c_key, c_name, ARRAY_AGG(r) AS revenue  
    FROM clients AS c
    LEFT JOIN revenue AS r ON r.c_key = c.c_key
    GROUP BY c.c_key, c_name
  ) y
  ON x.c_key = y.c_key
)
SELECT *
FROM single_nested_table  

将表创建为

Row c_key c_name accounts.a_key accounts.c_key revenue.r_key revenue.c_key revenue.revenue
1   1     abc    11             1              11            1             10    
                 12             1              12            1             20    
2   2     def    21             2              21            2             10    
                 22             2                
                 23             2                

用于创建该表的确切查询并不重要 - 但清除结构/模式非常重要!

现在,回到你的问题

#standardSQL
WITH clients AS (
  SELECT 1 AS c_key, 'abc' AS c_name UNION ALL
  SELECT 2, 'def'
), accounts AS (
  SELECT 11 AS a_key, 1 AS c_key UNION ALL
  SELECT 12, 1 UNION ALL
  SELECT 21, 2 UNION ALL
  SELECT 22, 2 UNION ALL
  SELECT 23, 2
), revenue AS (
  SELECT 11 AS r_key, 1 AS c_key, 10 AS revenue UNION ALL
  SELECT 12, 1, 20 UNION ALL
  SELECT 21, 2, 10
), single_nested_table AS (
  SELECT x.c_key, x.c_name, accounts, revenue 
  FROM (
    SELECT c.c_key, c_name, ARRAY_AGG(a) AS accounts --, array_agg(r) as revenue  
    FROM clients AS c
    LEFT JOIN accounts AS a ON a.c_key = c.c_key
    GROUP BY c.c_key, c_name
  ) x
  JOIN (
    SELECT c.c_key, c_name, ARRAY_AGG(r) AS revenue  
    FROM clients AS c
    LEFT JOIN revenue AS r ON r.c_key = c.c_key
    GROUP BY c.c_key, c_name
  ) y
  ON x.c_key = y.c_key
)
SELECT 
  c_key, c_name, 
  ARRAY_LENGTH(accounts) AS distinct_accounts, 
  (SELECT SUM(revenue) FROM UNNEST(revenue)) AS revenue
FROM single_nested_table   

这给出了你的要求:

Row c_key   c_name  distinct_accounts   revenue  
1   1       abc     2                   30   
2   2       def     3                   10