使用AS时选择BigQuery问题

时间:2016-03-24 08:19:06

标签: sql select google-bigquery

我们喜欢BigQuery它太棒了我们经常使用它但是我们遇到了Select语句的问题,出于某种原因,如果你创建一个基于变量的Select你定义它不起作用,但是当你使用完整的代码时它所做的变量

我得到的错误如下 -

Error: (L5:47): Expression 'RequestsPerSession' is not present in the GROUP BY list

当然我们不想基于这个变量聚合 - 因为它在GROUP BY部分是没有的。

这不起作用

SELECT
  ch,
  COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
  COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
  ROUND(Requests/UniqueSessions,1) AS RequestsPerSession

这确实有效

SELECT
  ch,
  COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
  COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
  ROUND(COUNT(IF(Action CONTAINS "request_data",1,NULL))/UniqueSessions,1) AS RequestsPerSession

关于如何修复它的任何想法?

还有一种方法可以在select函数中创建一个变量用于计算目的但不会在最终结果中显示它吗?

2 个答案:

答案 0 :(得分:1)

  

关于如何修复它的任何想法?

GROUP BY子句允许您对给定字段或字段集具有相同值的行进行分组,以便您可以计算相关字段的聚合。因此,在SELECT列表中,您可以拥有分组的字段或aggregations
接受上述内容 - 您的第一个示例按预期失败,第二个示例按预期工作。

所以,没有什么可以解决的!

  

有没有办法在select函数中创建变量   计算目的但没有出现在最终结果中?

您可以使用subquery来实现此目的 例如,假设您的原始查询是:

SELECT
  ch,
  COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
  COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
  ROUND(COUNT(IF(Action CONTAINS "request_data",1,NULL))/UniqueSessions,1) AS RequestsPerSession
FROM YourTable
GROUP BY ch

使用子查询,它看起来像下面这样:

SELECT
  ch,
  COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
  COUNT(request) AS Requests,
  ROUND(COUNT(request)/COUNT(DISTINCT(AC_SessionID),15000000),1) AS RequestsPerSession
FROM (
  SELECT 
    ch, 
    AC_SessionID, 
    IF(Action CONTAINS "request_data",1,NULL) AS request
  FROM YourTable
)
GROUP BY ch

可以进一步“转化”为

SELECT 
  ch,
  UniqueSessions,
  Requests,
  ROUND(Requests/UniqueSessions, 1) AS RequestsPerSession
FROM (
  SELECT
    ch,
    COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
    COUNT(request) AS Requests,
  FROM (
    SELECT 
      ch, 
      AC_SessionID, 
      IF(Action CONTAINS "request_data",1,NULL) AS request
    FROM YourTable
  )
  GROUP BY ch
)

对于这种“优化”的延伸,取决于个人喜好,我认为

  

对于子查询选项,它不会增加数据量   我们运行,因为它是数据的两倍(几乎翻倍) - 我们有一个   大量数据因此成本确实开始出现问题

在同一个查询中,您可以针对同一个表格拥有多个子查询,只有在您执行此操作时才会收取费用!到目前为止,这是计费的工作原理。所以你不应该担心它。 BigQuery非常智能,可以优化实际的数据使用情况,因此我认为性能也不应该太过关注

答案 1 :(得分:0)

  

关于如何修复它的任何想法?

尝试以下"解决方法"

SELECT
  ch,
  COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
  COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
  ROUND(Requests/UniqueSessions,1) * MAX(1) AS RequestsPerSession

我意识到看起来像引擎需要一些提示来理解这个特定字段(RequestsPerSession)不是用于分组而是用于聚合。

以下是我测试的方式"

SELECT id, 
  COUNT(a) AS b,
  SUM(a) AS c,
  COUNT(a)/SUM(a) AS k1,
  b/SUM(a) AS k2,
  COUNT(a)/c AS k3,
  MAX(1) * (b/c) AS k4
FROM
(SELECT 1 AS id, 1 AS a),
(SELECT 1 AS id, 2 AS a),
(SELECT 1 AS id, 3 AS a),
(SELECT 1 AS id, 4 AS a),
(SELECT 2 AS id, 1 AS a),
(SELECT 2 AS id, 2 AS a),
(SELECT 2 AS id, 3 AS a),
GROUP BY id