你好,我试图获取查询日志成本,我得到了总额,但是当我尝试按数据集细分时,我得到了这个错误:
'无法访问'
上类型为ARRAY>的值的字段datasetId这是我要运行的查询:
WITH
data AS (
SELECT
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent,
(
SELECT
ARRAY_TO_STRING((
SELECT
ARRAY_AGG(datasetId)
FROM
UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables.datasetId) ))) AS datasetIds
FROM
`kkk111.bq_audit_log_export.cloudaudit_googleapis_com_data_access_20190206` )
SELECT
datasetIds,
FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM
data
WHERE
jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY
datasetIds
ORDER BY
Estimated_USD_Cost DESC
我正在使用标准SQL方言
我该如何投射此字段:
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables.datasetId
从数组到字符串? 我想念什么? 谢谢。
答案 0 :(得分:1)
您需要UNNEST外层数组才能在内部选择数据集ID:
SELECT
ARRAY_TO_STRING((
SELECT ARRAY_AGG(datasetId)
FROM UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables)
), ',') AS datasetIds
FROM ...
答案 1 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
WITH data AS (
SELECT
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent,
ref.datasetId AS datasetId
FROM `kkk111.bq_audit_log_export.cloudaudit_googleapis_com_data_access_20190206`,
UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables) ref
)
SELECT
datasetId,
FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM data
WHERE jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY datasetId
ORDER BY Estimated_USD_Cost DESC
如您所见,很明显,您需要UNNEST referencedTables ARRAY,但是您还需要确保对Cost的最终计算尽可能接近正确的值。同一查询可以引用同一数据集中的多个表,因此最好在CTE中使用DISTINCT。而且,相同的查询可以引用来自多个数据集的表-因此,在相同的计费字节中,属性将归因于多个数据集,因此您将被高估!我不知道您的确切意图-但您可能需要引入一些逻辑来在参考数据集中分配成本。