存储基于时间的数据并找出差距

时间:2019-02-21 11:12:40

标签: sql nosql google-bigquery bigdata

我有几百万个传感器,它们连续进行运行状况检查并将每个5 minutes的数据发送到服务器。我的任务是存储这些数据点,并每小时生成一次有关未能报告的数据的报告。

问题:

  • 哪个数据库最适合此类操作? (sql / nosql)具体是哪一个?索引将是字符串。
  • 什么是上述所选数据库的最佳查询

示例数据:

"point1"    "12-2-19T00:00"
"point2"    "12-2-19T00:00"
"point1"    "12-2-19T00:05" #missing point2
"point1"    "12-2-19T00:10"
"point2"    "12-2-19T00:10"

我需要找到point2

1 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
WITH temp AS (
  SELECT point, PARSE_TIMESTAMP('%d-%m-%yT%H:%M', dt) dt
  FROM `project.dataset.table`
), points AS (
  SELECT DISTINCT point FROM temp
), times AS (
  SELECT dt
  FROM (SELECT MIN(dt) min_dt, MAX(dt) max_dt FROM temp), 
  UNNEST(GENERATE_TIMESTAMP_ARRAY(min_dt, max_dt, INTERVAL 5 MINUTE)) dt
)
SELECT 
  point, 
  FORMAT_DATETIME('%d-%m-%yT%H:%M', DATETIME(dt)) dt, 
  IF(t.point IS NULL, 'missing', 'ok') status
FROM times CROSS JOIN points 
LEFT JOIN temp t USING(dt, point)

您可以使用问题中的示例数据来进行测试,如上示例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'point1' point, '12-2-19T00:00' dt UNION ALL
  SELECT 'point2', '12-2-19T00:00' UNION ALL
  SELECT 'point1', '12-2-19T00:05' UNION ALL -- #missing point2
  SELECT 'point1', '12-2-19T00:10' UNION ALL
  SELECT 'point2', '12-2-19T00:10' 
), temp AS (
  SELECT point, PARSE_TIMESTAMP('%d-%m-%yT%H:%M', dt) dt
  FROM `project.dataset.table`
), points AS (
  SELECT DISTINCT point FROM temp
), times AS (
  SELECT dt
  FROM (SELECT MIN(dt) min_dt, MAX(dt) max_dt FROM temp), 
  UNNEST(GENERATE_TIMESTAMP_ARRAY(min_dt, max_dt, INTERVAL 5 MINUTE)) dt
)
SELECT 
  point, 
  FORMAT_DATETIME('%d-%m-%yT%H:%M', DATETIME(dt)) dt, 
  IF(t.point IS NULL, 'missing', 'ok') status
FROM times CROSS JOIN points 
LEFT JOIN temp t USING(dt, point)
-- ORDER BY dt, point   

有结果

Row point   dt              status   
1   point1  12-02-19T00:00  ok   
2   point2  12-02-19T00:00  ok   
3   point1  12-02-19T00:05  ok   
4   point2  12-02-19T00:05  missing  
5   point1  12-02-19T00:10  ok   
6   point2  12-02-19T00:10  ok