我要保存用户状态的历史记录。
为此,我有一个包含两个列的表:user_identifier和status。
user_identifier是一个字符串,状态是具有key:value对:日期和状态的重复记录。
当用户更改状态(例如,从活动状态更改为非活动状态)时,我想更新此表并添加新状态,同时保留旧状态。
这是表模式:
[
{
"description": "user identifier",
"mode": "REQUIRED",
"name": "user_id",
"type": "STRING"
},
{
"description": "status - can be either sent or pending, initial state is pending",
"mode": "REPEATED",
"name": "status",
"type": "RECORD",
"fields": [
{
"name": "status_date",
"type": "DATE",
"mode": "REQUIRED"
},
{
"name": "value",
"type": "STRING",
"mode": "REQUIRED"
}
]
}
]
是否甚至可以在此架构中插入新的用户状态?我应该重新设计架构吗? 如何在BigQuery中正确地利用嵌套功能?
答案 0 :(得分:1)
以下是用于BigQuery Standard SQL的数据,并假设您具有问题中所述的状态表project.dataset.statuses
,并且具有更新表project.dataset.updates
,在其中累积了用于状态表的定期更新的更新
因此伪数据可能看起来像
WITH `project.dataset.statuses` AS (
SELECT 'a' user_id, [STRUCT<status_date DATE, value STRING>('2018-11-03', 'pending')] status UNION ALL
SELECT 'b', [STRUCT<status_date DATE, value STRING>('2018-11-04', 'pending')] UNION ALL
SELECT 'c', []
), `project.dataset.updates` AS (
SELECT 'a' user_id, [STRUCT<status_date DATE, value STRING>('2018-11-05', 'sent')] new_statuses UNION ALL
SELECT 'c', [STRUCT<status_date DATE, value STRING>('2018-11-05', 'pending')]
)
其中更新表具有完全相同的架构,并表示需要添加到主表的新更新
在SELECT之下,返回已连接状态
#standardSQL
SELECT
t.user_id,
IF(u.user_id IS NULL, status, ARRAY_CONCAT(status, new_statuses)) status
FROM `project.dataset.statuses` t
LEFT JOIN `project.dataset.updates` u
ON t.user_id = u.user_id
您可以使用下面的DDL与它们“更新”状态表
#standardSQL
CREATE OR REPLACE TABLE `project.dataset.statuses` AS
SELECT
t.user_id,
IF(u.user_id IS NULL, status, ARRAY_CONCAT(status, new_statuses)) status
FROM `project.dataset.statuses` t
LEFT JOIN `project.dataset.updates` u
ON t.user_id = u.user_id
如果要应用于虚拟数据
状态:
Row user_id status.status_date status.value
1 a 2018-11-03 pending
2 b 2018-11-04 pending
3 c
更新:
Row user_id new_statuses.status_date new_statuses.value
1 a 2018-11-05 sent
3 c 2018-11-05 pending
结果将为
Row user_id status.status_date status.value
1 a 2018-11-03 pending
2018-11-05 sent
2 b 2018-11-04 pending
3 c 2018-11-05 pending
如果updates
表可以由尚未在主表中的新用户组成-以下将处理这种情况
#standardSQL
-- CREATE OR REPLACE TABLE `project.dataset.statuses` AS
SELECT
IFNULL(t.user_id, u.user_id) user_id,
CASE
WHEN t.user_id = u.user_id THEN ARRAY_CONCAT(status, new_statuses)
WHEN t.user_id IS NULL THEN new_statuses
WHEN u.user_id IS NULL THEN status
END status
FROM `project.dataset.statuses` t
FULL JOIN `project.dataset.updates` u
ON t.user_id = u.user_id