为什么Bigquery的standardsql返回多行而legacysql只返回一行?

时间:2017-02-23 16:12:00

标签: google-bigquery standard-sql

你能帮我解决一下Bigquery的标准语法问题吗? 我试图理解为什么(以及如何修复它)这个standardql查询返回2行,而这个legacysql只返回1(我希望只有一行)。

StandardSQL

  SELECT 
  hits2.transaction.transactionId as transactionId
  FROM `ga-export-TTTT.1234567890.ga_sessions_*`
  ,UNNEST (hits) as hits2

WHERE 
hits2.transaction.transactionId = '03971163'

LegacySQL

select
hits.transaction.transactionId
FROM
 TABLE_DATE_RANGE([ga-export-TTTT:1234567890.ga_sessions_], TIMESTAMP('2016-09-01'), TIMESTAMP('2017-02-14'))
WHERE 
hits.transaction.transactionId = '03971163'

在阅读帮助后,我还尝试了 StandardSQL ,其结果中包含相同的两行:

select 
title
from
(
  SELECT
    ARRAY(SELECT transaction.transactionId FROM UNNEST(hits)
          WHERE transaction.transactionId = '03971163') AS title
  FROM `ga-export-TTTTT.1234567890.ga_sessions_*`
  )
WHERE ARRAY_LENGTH(title) > 0;

非常感谢您的帮助。

2 个答案:

答案 0 :(得分:0)

请改为尝试:

#standardSQL
SELECT 
  ARRAY(SELECT transaction.transactionId
        FROM UNNEST(hits)
        WHERE transaction.transactionId = '03971163')
    AS transactionIds
FROM `ga-export-TTTT.1234567890.ga_sessions_*`;

对数组使用CROSS JOIN时,每个数组元素的结果都会得到一行。如果您希望与表中的行一一对应,则可以使用数组子查询(如上面的查询中所示)在应用过滤器后“重新打包”数组元素。另外一个例子,您可以使用ARRAY子查询以及SELECT AS STRUCT

#standardSQL
SELECT 
  ARRAY(SELECT AS STRUCT transaction.*
        FROM UNNEST(hits)
        WHERE transaction.transactionId = '03971163')
    AS transactions
FROM `ga-export-TTTT.1234567890.ga_sessions_*`;

对于匹配transaction条件的匹配,这将返回transaction.transactionId = '03971163'内所有字段的数组。如果您只想要数组的单个元素,则可以在选择列表中使用带有LIMIT的子查询:

#standardSQL
SELECT 
 (SELECT transaction.transactionId
  FROM UNNEST(hits)
  WHERE transaction.transactionId = '03971163'
  LIMIT 1)
    AS transactionId
FROM `ga-export-TTTT.1234567890.ga_sessions_*`;

或者:

#standardSQL
SELECT 
 (SELECT transaction
  FROM UNNEST(hits)
  WHERE transaction.transactionId = '03971163'
  LIMIT 1)
    AS transaction
FROM `ga-export-TTTT.1234567890.ga_sessions_*`;

答案 1 :(得分:0)

我看到的两个查询(对于Legacy和Standard SQL版本)的唯一区别是你可能会查询不同的表集!

  

在旧版SQL版本中 - 您将时间段的表格列表从TIMESTAMP(' 2016-09-01')到TIMESTAMP(' 2017-02-14'),而在标准中SQL查询ga-export-TTTT.1234567890数据集

中的所有ga_sessions_表

请尝试使用以下内容将表格过滤到与旧版SQL中相同的列表

#standardSQL
SELECT hits2.transaction.transactionId as transactionId
FROM `ga-export-TTTT.1234567890.ga_sessions_*`, UNNEST (hits) as hits2
WHERE hits2.transaction.transactionId = '03971163'
AND _TABLE_SUFFIX BETWEEN '20160901' AND '20170214'