如何在BigQuery中更新多行

时间:2019-08-22 11:53:43

标签: google-bigquery

我仅在几周内使用过user_id(在我只有user_pseudo_id之前)”,所以我想在数据集的早期更新user_id值(为NULL)。

我在这里找到了一个解决方案,但是它不合适,因为我为每个user_id提供了更多的user_pseudo_id

update multiple rows which is having null values

我的代码:

UPDATE `dataset.events`
    SET user_id = b.user_id
    FROM `dataset.events` a
        INNER JOIN (SELECT DISTINCT user_pseudo_id, user_id 
                    FROM `dataset.events`
                    WHERE user_id IS NOT NULL) b
            ON a.user_pseudo_id = b.user_pseudo_id
     WHERE a.user_id IS NULL

该代码有效,但修改了0行并显示以下弹出消息:“ UPDATE / MERGE必须与每个目标行最多匹配一个源行”

更新: 我的日期集:

user_pseudo_id ____ user_id

a ___________________ NULL
a___________________NULL
b___________________NULL
c___________________NULL
a___________________111
b___________________111
c___________________222

我想要什么:

user_pseudo_id ____ user_id

a ___________________ 111​​
a___________________111
b___________________111
c___________________222
a___________________111
b___________________111
c___________________222

请注意,具有a和b伪ID的用户是同一用户,因此他们只有一个user_id。

1 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
WITH map AS (
  SELECT user_pseudo_id, MIN(user_id) user_id
  FROM `project.dataset.table`
  WHERE NOT user_id IS NULL
  GROUP BY user_pseudo_id
)
SELECT user_pseudo_id, IFNULL(t.user_ID, m.user_id)
FROM `project.dataset.table` t
LEFT JOIN map m
USING(user_pseudo_id)   

您可以使用问题中的示例数据来测试,玩游戏,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' user_pseudo_id, NULL user_id UNION ALL
  SELECT 'a', NULL UNION ALL
  SELECT 'b', NULL UNION ALL
  SELECT 'c', NULL UNION ALL
  SELECT 'a', '111' UNION ALL
  SELECT 'b', '111' UNION ALL
  SELECT 'c', '222' 
), map AS (
  SELECT user_pseudo_id, MIN(user_id) user_id
  FROM `project.dataset.table`
  WHERE NOT user_id IS NULL
  GROUP BY user_pseudo_id
)
SELECT user_pseudo_id, IFNULL(t.user_ID, m.user_id) user_id
FROM `project.dataset.table` t
LEFT JOIN map m
USING(user_pseudo_id)   

有结果

Row user_pseudo_id  user_id  
1   a               111  
2   a               111  
3   b               111  
4   c               222  
5   a               111  
6   b               111  
7   c               222    

最后,您可以将其包装为UPDATE语法,如以下示例所示

#standardSQL
UPDATE `project.dataset.table` t
SET user_id = IFNULL(t.user_ID, map.user_id)
FROM (
  SELECT user_pseudo_id, MIN(user_id) user_id
  FROM `project.dataset.table`
  WHERE NOT user_id IS NULL
  GROUP BY user_pseudo_id
) map
WHERE t.user_pseudo_id = map.user_pseudo_id

或者您可以仅过滤要更新的行,其中user_id为null,如下例所示

#standardSQL
UPDATE `project.dataset.table` t
SET user_id = map.user_id
FROM (
  SELECT user_pseudo_id, MIN(user_id) user_id
  FROM `project.dataset.table`
  WHERE NOT user_id IS NULL
  GROUP BY user_pseudo_id
) map
WHERE t.user_pseudo_id = map.user_pseudo_id
AND t.user_ID IS NULL