替换嵌套/重复的字段

时间:2018-06-01 11:17:07

标签: sql google-bigquery

我想替换重复的字段(hits.customdimensions.value),而其余的数据应该保持不变。让我们说我想用MD5哈希hits.customdimension.value,其中hits.customdimension.index在1和5之间,并用散列的值替换原始值

如果是customdimensions.value,它似乎可以解决这个问题:

select x.* except (customdimensions), cd.index as cdindex, MD5(cd.value) as 
cdvalue
from `datasetx.tabley` x, unnest(customdimensions) as cd
where cd.index between 1 and 5

不幸的是,如果我想用hits.customdimensions.value做类似的事情,我必须在不使用(hit.customdimensions)之前使用except(命中),这会使其他列消失。

select x.* except (hits), hitcd.index as hitcdindex, MD5(hitcd.value) as 
hitcdvalue 
from `datasetx.tabley` x, unnest(hits) as hit, unnest(hit.customdimensions) 
as hitcd
where hitcd.index between 1 and 5

这有什么简单的解决方案吗?

1 个答案:

答案 0 :(得分:1)

下面完全保留整个表格结构/数据,并且hits.customdimension.value的哈希值hits.customdimension.index介于1到5之间

   
#standardSQL
SELECT * 
  REPLACE( ARRAY(
    SELECT AS STRUCT * 
      REPLACE( ARRAY(
        SELECT AS STRUCT 
          index, 
          IF(index BETWEEN 1 AND 5, TO_BASE64(MD5(value)), value) value 
        FROM UNNEST(customdimensions)      
      ) AS customdimensions
    )
    FROM UNNEST(hits)
  ) AS hits
)
FROM `datasetx.tabley`