查询详细信息时添加标题信息

时间:2020-08-24 20:03:20

标签: google-bigquery

我有这种数据。这是示例Teradata日志,其中以QueryID级别捕获CPU和IO。我已经解析了与QueryID对应的querytext,以进一步标识其中引用的数据库和表。在解析查询并细分为详细信息后,我无法在详细级别划分所捕获的CPU和IO。它们是该查询的标头级属性

我现在在Datastudio上显示数据。当我占用仪表板上的DatabaseReferred或TablesReferred字段以获取该查询中引用的表的不同计数时,CPU和IO在内部对数据进行UNNEST时被复制,并且当我对其进行汇总时,它会炸毁

您能给我一个想法,如何在每个查询中仅对CPU求和一次,同时仍计算该查询中不同的DatabaseReferred和TablesReferred

输入数据如下

Row    Username   QueryId    CPU    IO    DatabaseReferred TablesReferred 
1)     ABC        1234       100    123   DB1              TB1
                                          DB2              TB2
                                          DB1              TB3
2)     ABC        8454       589    565   DB1              TB3
                                          DB2              TB6
3)     ABC        3564       145    243   DB3              TB4
                                          DB5              TB3
4)     PQR        6352       737    562   DB2              TB6
                                          DB1              TB7
                                          DB1              TB2
5)     PQR        2345       200    126   DB2              TB5
                                          DB1              TB1


输出如下所示。

Username  Count(DistinctQueryID)  Sum(CPU)  SUM(IO)  DistinctDatabaseReferred DistinctTablesReferred 
ABC          3                     834       931           4                         5
PQR          2                     937       688           2                         5

为了快速参考,我正在准备WITH子句,以供解决方案中使用的输入数据

SELECT 'ABC' username, cast('1234' as int64) QueryID, cast('100' as int64) CPU, cast('123' as int64) IO, ['DB1','DB2','DB1'] DatabaseReferred, ['TB1','TB2','TB3'] TablesReferred 
  UNION ALL
  SELECT 'ABC' username, cast('8454' as int64) QueryID, cast('589' as int64) CPU, cast('565' as int64) IO, ['DB1','DB2'] DatabaseReferred, ['TB3','TB6'] TablesReferred 
  UNION ALL
  SELECT 'ABC' username, cast('3564' as int64) QueryID, cast('145' as int64) CPU, cast('243' as int64) IO, ['DB3','DB5'] DatabaseReferred, ['TB4','TB3'] TablesReferred 
  UNION ALL
  SELECT 'PQR' username, cast('6352' as int64) QueryID, cast('737' as int64) CPU, cast('562' as int64) IO, ['DB2','DB1','DB1'] DatabaseReferred, ['TB6','TB7','TB2'] TablesReferred 
  UNION ALL
  SELECT 'PQR' username, cast('2345' as int64) QueryID, cast('200' as int64) CPU, cast('126' as int64) IO, ['DB2','DB1'] DatabaseReferred, ['TB5','TB1'] TablesReferred 

1 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
SELECT 
  Username, 
  Count_of_Distinct_QueryId, 
  Sum_CPU, 
  Sum_IO,
  (SELECT COUNT(DISTINCT db) FROM t.dbs AS db) AS DistinctDatabaseReferred,
  (SELECT COUNT(DISTINCT tbl) FROM t.tbls AS tbl) AS DistinctTablesReferred
FROM (
  SELECT Username,
    COUNT(DISTINCT QueryId) AS Count_of_Distinct_QueryId,
    SUM(CPU) AS Sum_CPU,
    SUM(IO) AS Sum_IO,
    ARRAY_CONCAT_AGG(DatabaseReferred) dbs,
    ARRAY_CONCAT_AGG(TablesReferred) tbls
  FROM `project.dataset.table`
  GROUP BY Username
) t   

如果要应用于您的问题的样本数据-输出为

Row Username    Count_of_Distinct_QueryId   Sum_CPU Sum_IO  DistinctDatabaseReferred    DistinctTablesReferred   
1   ABC         3                           834     931     4                           5    
2   PQR         2                           937     688     2                           5