我仅在几周内使用过user_id(在我只有user_pseudo_id之前)”,所以我想在数据集的早期更新user_id值(为NULL)。
我在这里找到了一个解决方案,但是它不合适,因为我为每个user_id提供了更多的user_pseudo_id
update multiple rows which is having null values
我的代码:
UPDATE `dataset.events`
SET user_id = b.user_id
FROM `dataset.events` a
INNER JOIN (SELECT DISTINCT user_pseudo_id, user_id
FROM `dataset.events`
WHERE user_id IS NOT NULL) b
ON a.user_pseudo_id = b.user_pseudo_id
WHERE a.user_id IS NULL
该代码有效,但修改了0行并显示以下弹出消息:“ UPDATE / MERGE必须与每个目标行最多匹配一个源行”
更新: 我的日期集:
user_pseudo_id ____ user_id
a ___________________ NULL
a___________________NULL
b___________________NULL
c___________________NULL
a___________________111
b___________________111
c___________________222
我想要什么:
user_pseudo_id ____ user_id
a ___________________ 111
a___________________111
b___________________111
c___________________222
a___________________111
b___________________111
c___________________222
请注意,具有a和b伪ID的用户是同一用户,因此他们只有一个user_id。
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
WITH map AS (
SELECT user_pseudo_id, MIN(user_id) user_id
FROM `project.dataset.table`
WHERE NOT user_id IS NULL
GROUP BY user_pseudo_id
)
SELECT user_pseudo_id, IFNULL(t.user_ID, m.user_id)
FROM `project.dataset.table` t
LEFT JOIN map m
USING(user_pseudo_id)
您可以使用问题中的示例数据来测试,玩游戏,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' user_pseudo_id, NULL user_id UNION ALL
SELECT 'a', NULL UNION ALL
SELECT 'b', NULL UNION ALL
SELECT 'c', NULL UNION ALL
SELECT 'a', '111' UNION ALL
SELECT 'b', '111' UNION ALL
SELECT 'c', '222'
), map AS (
SELECT user_pseudo_id, MIN(user_id) user_id
FROM `project.dataset.table`
WHERE NOT user_id IS NULL
GROUP BY user_pseudo_id
)
SELECT user_pseudo_id, IFNULL(t.user_ID, m.user_id) user_id
FROM `project.dataset.table` t
LEFT JOIN map m
USING(user_pseudo_id)
有结果
Row user_pseudo_id user_id
1 a 111
2 a 111
3 b 111
4 c 222
5 a 111
6 b 111
7 c 222
最后,您可以将其包装为UPDATE语法,如以下示例所示
#standardSQL
UPDATE `project.dataset.table` t
SET user_id = IFNULL(t.user_ID, map.user_id)
FROM (
SELECT user_pseudo_id, MIN(user_id) user_id
FROM `project.dataset.table`
WHERE NOT user_id IS NULL
GROUP BY user_pseudo_id
) map
WHERE t.user_pseudo_id = map.user_pseudo_id
或者您可以仅过滤要更新的行,其中user_id为null,如下例所示
#standardSQL
UPDATE `project.dataset.table` t
SET user_id = map.user_id
FROM (
SELECT user_pseudo_id, MIN(user_id) user_id
FROM `project.dataset.table`
WHERE NOT user_id IS NULL
GROUP BY user_pseudo_id
) map
WHERE t.user_pseudo_id = map.user_pseudo_id
AND t.user_ID IS NULL