BigQuery UPDATE嵌套数组字段

时间:2018-05-04 14:02:27

标签: sql nested sql-update google-bigquery

我需要使用另一个表中的值更新一个表中的嵌套字段。 使用this solution我提出了一些有效的方法,但并不完全符合我的要求。 这是我的解决方案:

#standardSQL
UPDATE
  `attribution.daily_sessions_20180301_copy1` AS target
SET
hits = ARRAY(
  SELECT AS STRUCT * REPLACE(ARRAY(
    SELECT AS STRUCT *
    FROM(
      SELECT AS STRUCT * REPLACE(map.category AS productCategoryAttribute) FROM UNNEST(product))) AS product) FROM UNNEST(hits)
)
FROM
  `attribution.attribute_category_map`
AS map
WHERE
  (
    SELECT REPLACE(LOWER(prod.productCategory), 'amp;', '') FROM UNNEST(target.hits) AS h,
    UNNEST(h.product) AS prod LIMIT 1) = map.raw_name

attribute_category_map是一个包含两列的表,其中我在第1列中查找相应的值,并将目标表中的数据替换为第2列中的值。我实现的最佳结果 - 更新了具有相同值的一行上的所有嵌套字段,这是仅对第一个嵌套字段进行更正,而不是使用特定值更新每个嵌套字段。

主表的简化架构:

[  
   {  
      "name":"sessionId",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"hits",
      "type":"RECORD",
      "mode":"REPEATED",
      "fields":[  
         {  
            "name":"product",
            "type":"RECORD",
            "mode":"REPEATED",
            "fields":[  
               {  
                  "name":"productCategory",
                  "type":"STRING",
                  "mode":"NULLABLE"
               },
               {  
                  "name":"productCategoryAttribute",
                  "type":"STRING",
                  "mode":"NULLABLE"
               }
            ]
         }
      ]
   }
]

会话行中通常有几个匹配,一个匹配中有几个产品。值看起来像那些(如果你不想):

-----------------------------------------------------------------------------
sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
1         | automotive chemicals        | null
1         | automotive tools            | null
1         | null                        | null
2         | null                        | null
2         | automotive chemicals        | null
2         | null                        | null
3         | null                        | null
3         | bed accessories             | null
4         | null                        | null
4         | null                        | null
4         | automotive chemicals        | null
4         | null                        | null
-----------------------------------------------------------------------------

地图表的架构:

[  
   {  
      "name":"raw_name",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"category",
      "type":"STRING",
      "mode":"NULLABLE"
   }
]

的值如下:

---------------------------------------------------
raw_name              |category                   |
---------------------------------------------------
automotive chemicals  |d1y2 - automotive chemicals|
automotive paint      |dijf1 - automotive paint   |
automotive tools      |efw1 - automotive tools    |
baby & infant toys    |wwfw - baby & infant toys  |
batteries & power     |fdsv- batteries & power    |
bed accessories       |0k77 - bed accessories     |
bike racks            |12df - bike racks          |
--------------------------------------------------

我想要的结果是:

-----------------------------------------------------------------------------
    sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
    1         | automotive chemicals        | d1y2 - automotive chemicals
    1         | automotive tools            | efw1 - automotive tools
    1         | null                        | null
    2         | null                        | null
    2         | automotive chemicals        | d1y2 - automotive chemicals
    2         | null                        | null
    3         | null                        | null
    3         | bed accessories             | 0k77 - bed accessories
    4         | null                        | null
    4         | null                        | null
    4         | automotive chemicals        | d1y2 - automotive chemicals
    4         | null                        | null
    -----------------------------------------------------------------------------

我需要从主表中取值productCategory,在列raw_name中的map表中查找,从colum类中取值并将其放到主表的productCategoryAttribute列中。主要问题是目标字段是双嵌套的,我无法弄清楚如何直接加入它们。

1 个答案:

答案 0 :(得分:3)

以下测试!
按原样保留整个表的模式/数据,并仅根据相应的映射更新productCategoryAttribute的值

   
#standardSQL
UPDATE `project.dataset.your_table` t
SET hits = 
  ARRAY(
    SELECT AS STRUCT * REPLACE(
      ARRAY(
        SELECT AS STRUCT product.* REPLACE(
          CASE WHEN map.raw_name = product.productCategory THEN category 
            ELSE productCategoryAttribute END AS productCategoryAttribute)
        FROM UNNEST(product) product
        LEFT JOIN UNNEST(agg_map.map) map 
        ON map.raw_name = product.productCategory
      ) AS product)
    FROM UNNEST(hits) hit
  ) 
FROM (SELECT ARRAY_AGG(row) map FROM `project.dataset.map` row) agg_map 
WHERE TRUE   

注意:上面的解决方案假设map表不是那么大,因为它依赖于将整个map表聚合成一个数组