我们喜欢BigQuery它太棒了我们经常使用它但是我们遇到了Select语句的问题,出于某种原因,如果你创建一个基于变量的Select你定义它不起作用,但是当你使用完整的代码时它所做的变量
我得到的错误如下 -
Error: (L5:47): Expression 'RequestsPerSession' is not present in the GROUP BY list
当然我们不想基于这个变量聚合 - 因为它在GROUP BY部分是没有的。
这不起作用
SELECT
ch,
COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
ROUND(Requests/UniqueSessions,1) AS RequestsPerSession
这确实有效
SELECT
ch,
COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
ROUND(COUNT(IF(Action CONTAINS "request_data",1,NULL))/UniqueSessions,1) AS RequestsPerSession
关于如何修复它的任何想法?
还有一种方法可以在select函数中创建一个变量用于计算目的但不会在最终结果中显示它吗?
答案 0 :(得分:1)
关于如何修复它的任何想法?
GROUP BY子句允许您对给定字段或字段集具有相同值的行进行分组,以便您可以计算相关字段的聚合。因此,在SELECT列表中,您可以拥有分组的字段或aggregations
接受上述内容 - 您的第一个示例按预期失败,第二个示例按预期工作。
所以,没有什么可以解决的!
有没有办法在select函数中创建变量 计算目的但没有出现在最终结果中?
您可以使用subquery来实现此目的 例如,假设您的原始查询是:
SELECT
ch,
COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
ROUND(COUNT(IF(Action CONTAINS "request_data",1,NULL))/UniqueSessions,1) AS RequestsPerSession
FROM YourTable
GROUP BY ch
使用子查询,它看起来像下面这样:
SELECT
ch,
COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
COUNT(request) AS Requests,
ROUND(COUNT(request)/COUNT(DISTINCT(AC_SessionID),15000000),1) AS RequestsPerSession
FROM (
SELECT
ch,
AC_SessionID,
IF(Action CONTAINS "request_data",1,NULL) AS request
FROM YourTable
)
GROUP BY ch
可以进一步“转化”为
SELECT
ch,
UniqueSessions,
Requests,
ROUND(Requests/UniqueSessions, 1) AS RequestsPerSession
FROM (
SELECT
ch,
COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
COUNT(request) AS Requests,
FROM (
SELECT
ch,
AC_SessionID,
IF(Action CONTAINS "request_data",1,NULL) AS request
FROM YourTable
)
GROUP BY ch
)
对于这种“优化”的延伸,取决于个人喜好,我认为
对于子查询选项,它不会增加数据量 我们运行,因为它是数据的两倍(几乎翻倍) - 我们有一个 大量数据因此成本确实开始出现问题
在同一个查询中,您可以针对同一个表格拥有多个子查询,只有在您执行此操作时才会收取费用!到目前为止,这是计费的工作原理。所以你不应该担心它。 BigQuery非常智能,可以优化实际的数据使用情况,因此我认为性能也不应该太过关注
答案 1 :(得分:0)
关于如何修复它的任何想法?
尝试以下"解决方法"
SELECT
ch,
COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
ROUND(Requests/UniqueSessions,1) * MAX(1) AS RequestsPerSession
我意识到看起来像引擎需要一些提示来理解这个特定字段(RequestsPerSession)不是用于分组而是用于聚合。
以下是我测试的方式"
SELECT id,
COUNT(a) AS b,
SUM(a) AS c,
COUNT(a)/SUM(a) AS k1,
b/SUM(a) AS k2,
COUNT(a)/c AS k3,
MAX(1) * (b/c) AS k4
FROM
(SELECT 1 AS id, 1 AS a),
(SELECT 1 AS id, 2 AS a),
(SELECT 1 AS id, 3 AS a),
(SELECT 1 AS id, 4 AS a),
(SELECT 2 AS id, 1 AS a),
(SELECT 2 AS id, 2 AS a),
(SELECT 2 AS id, 3 AS a),
GROUP BY id