使用standardsql在bigquery中选择不同的值

时间:2017-07-26 16:19:19

标签: google-bigquery standard-sql

我想选择多个列,并使用GROUP BY

对电子邮件进行分组
#standardSQL
SELECT
      customers.orderCustomerEmail AS email,      
      customers.orderCustomerNumber AS customerNumber,
      customers.billingFirstname AS billingFirstname,
      customers.billingLastname AS billingLastname
FROM dim_customers AS customers
GROUP BY customers.orderCustomerEmail

失败:

Error: SELECT list expression references customers.orderCustomerNumber
       which is neither grouped nor aggregated at [4:7]

这与此问题类似Bigquery select distinct values

但它并没有解决我的问题,因为它将所有列添加到GROUP BYSELECT DISTINCT

相同的结果不同

dim_customer架构:

orderCustomerEmail:STRING,
billingFirstname:STRING,
billingLastname:STRING,
orderCustomerNumber:STRING,
OrderNumber:STRING

虚拟数据:https://docs.google.com/spreadsheets/d/1T1JZRWni18hhU4tO-9kQqq5Y3hVWgpP-aE7o6ij9bDE/edit?usp=sharing

1 个答案:

答案 0 :(得分:0)

按某些列分组时,需要确保将一些聚合函数应用于其余列。否则你会得到你在问题中显示的确切错误

尝试下面的BigQuery Standard SQL示例

  
#standardSQL
SELECT 
  customers.orderCustomerEmail AS email,      
  ARRAY_AGG(STRUCT(customers.orderCustomerNumber AS customerNumber,
  customers.billingFirstname AS billingFirstname,
  customers.billingLastname AS billingLastname)) AS info
FROM `dim_customers`, UNNEST(customers) AS customers
GROUP BY email

或只是简单的DISTINCT

#standardSQL
SELECT DISTINCT 
  customers.orderCustomerEmail AS email,      
  customers.orderCustomerNumber AS customerNumber,
  customers.billingFirstname AS billingFirstname,
  customers.billingLastname AS billingLastname
FROM `dim_customers`, UNNEST(customers) AS customers

请注意:根据您的预期输出,您的问题不够具体,因此上述内容很可能需要根据您的具体需求进行调整

更新

  

我每个客户基本上需要一行(电子邮件是唯一标识符,因此是组),详细信息(数字,名字,姓氏)可以从最后一个条目中获取,例如

#standardSQL
WITH `dim_customers` AS (
  SELECT [
    STRUCT('a' AS orderCustomerEmail, 1 AS orderCustomerNumber, 'af' AS billingFirstname, 'al' AS billingLastname),
    STRUCT('a' AS orderCustomerEmail, 4 AS orderCustomerNumber, 'af1' AS billingFirstname, 'al2' AS billingLastname),
    STRUCT('b' AS orderCustomerEmail, 2 AS orderCustomerNumber, 'bf' AS billingFirstname, 'bl' AS billingLastname),
    STRUCT('c' AS orderCustomerEmail, 3 AS orderCustomerNumber, 'cf' AS billingFirstname, 'cl' AS billingLastname)
    ] AS customers UNION ALL
  SELECT [
    STRUCT('a' AS orderCustomerEmail, 1 AS orderCustomerNumber, 'af' AS billingFirstname, 'al' AS billingLastname),
    STRUCT('a' AS orderCustomerEmail, 4 AS orderCustomerNumber, 'af1' AS billingFirstname, 'al2' AS billingLastname),
    STRUCT('b' AS orderCustomerEmail, 2 AS orderCustomerNumber, 'bf' AS billingFirstname, 'bl' AS billingLastname),
    STRUCT('c' AS orderCustomerEmail, 3 AS orderCustomerNumber, 'cf' AS billingFirstname, 'cl' AS billingLastname)
    ] AS customers
)
SELECT
  customers.orderCustomerEmail AS email,      
  ARRAY_AGG(STRUCT(customers.orderCustomerNumber AS customerNumber,
    customers.billingFirstname AS billingFirstname,
    customers.billingLastname AS billingLastname))[OFFSET(0)] AS info
FROM `dim_customers`, UNNEST(customers) AS customers
GROUP BY email

更新

  

下面是更新的架构!

dim_customer架构:

  

orderCustomerEmail:STRING,
  billingFirstname:STRING,
  billingLastname:STRING,
  orderCustomerNumber:STRING,
  OrderNumber:STRING

#standardSQL
WITH `dim_customers` AS (
  SELECT 10201 AS orderCustomerNumber, 'a@email.com' AS orderCustomerEmail, 'Alex' AS billingFirstname, 'Miller' AS billingLastname UNION ALL
  SELECT 10202, 'b@email.com', 'Ben', 'Williams' UNION ALL
  SELECT 10203, 'c@email.com', 'Chris', 'Collins' UNION ALL
  SELECT 10204, 'd@email.com', 'David', 'Hems' UNION ALL
  SELECT 10201, 'a@email.com', 'A.', 'Miller' UNION ALL
  SELECT 10201, 'a@email.com', 'A.', 'Miller' UNION ALL
  SELECT 10202, 'b@email.com', 'Ben', 'Williams' UNION ALL
  SELECT 10202, 'b@email.com', 'Bens Father', 'Williams' UNION ALL
  SELECT 10205, 'a@email.com', 'A.', 'Miller' UNION ALL
  SELECT 10206, 'e@email.com', 'Ed', 'Winchell'
)
SELECT info.* FROM (
  SELECT
    orderCustomerEmail AS email, 
    ARRAY_AGG(STRUCT(
      orderCustomerEmail AS email, 
      orderCustomerNumber AS customerNumber,
      billingFirstname AS billingFirstname,
      billingLastname AS billingLastname))[OFFSET(0)] AS info
  FROM `dim_customers`
  GROUP BY email
)
-- ORDER BY email