将使用记录与BigQuery中的相应使用计划相关联

时间:2019-07-09 09:48:30

标签: google-bigquery

客户的资源使用情况:

+-------+-------------+-----------------------+
| usage | customer_id |  timestamp            |
+-------+-------------+-----------------------+
| 10    | 1           |  2019-01-12T01:00:00  |
| 16    | 1           |  2019-02-12T02:00:00  |
| 26    | 1           |  2019-03-12T03:00:00  |
| 24    | 1           |  2019-04-12T04:00:00  |
| 4     | 1           |  2019-05-15T01:00:00  |
+-------+-------------+-----------------------+

此表显示了每个客户每小时报告的使用情况。分钟和秒始终为零。

客户计划更改日志:

+--------+-------------+-----------------------+
| plan   | customer_id |  timestamp            |
+--------+-------------+-----------------------+
| A      | 1           |  2018-12-12T01:24:00  |
| B      | 1           |  2019-01-12T02:31:00  |
| C      | 1           |  2019-03-12T03:53:00  |
+--------+-------------+-----------------------+

当客户更改其使用计划时,操作将存储在更改日志中。

结果:将每个使用记录与使用计划相关联。

+-------+-------------+--------+-----------------------+
| usage | customer_id |  plan  |  timestamp            |
+-------+-------------+--------+-----------------------+
| 10    | 1           |  A     |  2019-01-05T01:00:00  |
| 16    | 1           |  B     |  2019-02-12T02:00:00  |
| 26    | 1           |  C     |  2019-03-10T03:00:00  |
| 24    | 1           |  C     |  2019-04-12T04:00:00  |
| 4     | 1           |  C     |  2019-05-15T01:00:00  |
+-------+-------------+--------+-----------------------+

我尝试过的事情:要确定特定使用记录的计划,请记录该记录的时间戳,并在使用计划日志中查找最新的计划更改记录:

SELECT
  customer_id,
  plan,
  timestamp,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) seqnum
FROM
  `project.dataset.table`
WHERE seqnum = 1 AND timestamp <= timestamp_of_the_usage_record

但是我不确定如何将其与使用表结合起来。我尝试过:

WITH log AS (
  SELECT
      customer_id,
      plan,
      timestamp,
      ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) seqnum
    FROM
      `project.dataset.plan_change_log`
)
SELECT
  t1.customer_id,
  log.plan,
  t1.usage,
  t1.timestamp
FROM
  `project.dataset.usage` t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp AND seqnum = 1

由于联接条件,结果表的行数少于原始用法表的行数。但是,行数应保持不变。有什么想法可以解决这个问题吗?

1 个答案:

答案 0 :(得分:2)

尽管您的示例中的数据对于最终结果的第一行和第三行来说有些偏离,但您走在正确的轨道上。

with data as (
SELECT
  t1.customer_id,
  log.plan,
  t1.usage,
  t1.timestamp,
  log.timestamp as logt,
  ROW_NUMBER() OVER (PARTITION BY t1.customer_id, t1.timestamp  ORDER BY  log.timestamp DESC) seqnum
FROM
  resource t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp 
)
select * from data where seqnum = 1

您要在连接结果上而不是之前创建序列。