BigQuery多个连接

时间:2017-12-06 18:28:55

标签: google-bigquery

我有一个加载到Bigquery的Apache组合日志文件。其中包含一个由resource,place_id,ip,start_time,end_time,device,status组成的模式。我正在尝试运行一个查询,该查询计算资源数量和设备数量,并按资源和设备对它们进行分组。

表:

resource   |  place_id  |  device  |  ip      |  status  |
-----------------------------------------------------------------
/resource1 | 6750320008 |  android | x.x.x.x  |  200     |
/resource1 | 6750320100 |  ipad    | x.x.x.y  |  200     |
/resource2 | 6750320008 |  android | x.x.x.z  |  200     |

查询:

SELECT resource, device
FROM (
  Select 
    EXACT_COUNT_DISTINCT(resource) AS URL,
    1 AS scalar,
  FROM ([daily_logs.app_logs_data]) 
  WHERE place_id = '6750320008' GROUP BY URL) AS datal
JOIN (
  SELECT
    COUNT(device) as DeviceCount,
    1 AS scalar
  FROM ([daily_logs.app_logs_data]) GROUP BY DeviceCount) AS y
ON datal.scalar=y.scalar

我收到此错误:Error: Cannot group by an aggregate.

我基本上是在同一个表中创建两个表来计算不同的项目,然后我想将它们连接在一起,但是按照这样的顺序对它们进行分组:

     URL   |  totalresourcecount  |  device  |  totaldevicecount
-----------------------------------------------------------------
/resource1 |          1           |  android |         1 
/resource1 |          1           |  ipad    |         1 
/resource2 |          1           |  android |         1

我已阅读google bigquery语法帮助并查看了一些示例,但没有任何内容产生所需的结果。提前谢谢!

1 个答案:

答案 0 :(得分:1)

以下是BigQuery Standard SQL,反映了后续评论中提供的逻辑

#standardSQL
SELECT resource, device, COUNT(1) cnt
FROM `project.dataset.yourtable`
WHERE place_id = '6750320008'
GROUP BY resource, device  

您可以使用以下虚拟数据进行上述测试/播放

#standardSQL
WITH `project.dataset.yourtable` AS (
  SELECT '/resource1' resource, '6750320008' place_id, 'android' device, 'x.x.x.x' ip, 200 status UNION ALL
  SELECT '/resource1', '6750320100', 'ipad', 'x.x.x.y', 200 UNION ALL
  SELECT '/resource2', '6750320008', 'android', 'x.x.x.z', 200
)
SELECT resource, device, COUNT(1) cnt
FROM `project.dataset.yourtable`
WHERE place_id = '6750320008'
GROUP BY resource, device   

请注意 - 上述内容取决于我如何理解您在后续评论中表达的查询逻辑