BigQuery取消嵌套数组-获取重复项

时间:2018-11-19 04:44:25

标签: arrays google-cloud-platform google-bigquery unnest

我正在处理BQ中的GCP结算查询。但是在以高成本提取数组的同时,我得到了错误的值,例如unnest以行格式返回数组元素。因此,如果我在一行中的数组中有2个元素,那么我将得到2行。

EG:

实际数组:

SELECT

TO_JSON_STRING(labels), cost

FROM

billing_export.gcp_billing_export

WHERE

_PARTITIONTIME >= "2018-08-01 00:00:00"

AND _PARTITIONTIME < "2018-09-01 00:00:00"

AND billing_account_id = "xxx-62378F-xxx"

AND TO_JSON_STRING(labels) = '[{"key":"application","value":"scaled-server"},{"key":"department","value":"hrd"}]'

and cost> 0 limit 10

enter image description here

和Unnest:

with cte as (SELECT

labels, cost

FROM

billing_export.gcp_billing_export

WHERE

_PARTITIONTIME >= "2018-08-01 00:00:00"

AND _PARTITIONTIME < "2018-09-01 00:00:00"

AND billing_account_id = "xxx-62378F-xxxx"

AND TO_JSON_STRING(labels) = '[{"key":"application","value":"scaled-server"},{"key":"department","value":"hrd"}]'

and cost> 0

limit 10 )

select labels,cost from cte ,

UNNEST(labels) AS la

enter image description here

问题:

我不需要重复的成本值,有人可以帮助我进行此查询吗?

1 个答案:

答案 0 :(得分:2)

代替

SELECT labels,cost from cte ,
UNNEST(labels) AS la   

尝试

SELECT la, cost from cte ,
UNNEST(labels) AS la   
  

更新

SELECT 
  ARRAY(
    SELECT AS STRUCT 
      JSON_EXTRACT_SCALAR(kv, '$.key') key, 
      JSON_EXTRACT_SCALAR(kv, '$.value') value 
    FROM UNNEST(SPLIT(labels, '},{')) kv_temp, 
    UNNEST([CONCAT('{', REGEXP_REPLACE(kv_temp, r'^\[{|}]$', ''), '}')]) kv
  ) labels,
  cost
FROM cte