对于这个问题,我无法拿出好标题。对于那个很抱歉。
我有两个表A和B。它们都有时间戳,并且在它们之间共享一个公共ID。这是两个表的架构:
Table A:
========
a_id int,
common_id int,
ts timestamp
...
Table B:
========
b_id int,
common_id int,
ts timestamp,
temperature int
表A每次更改状态时都更像设备数据。表B是更多的IoT数据,其中包含每分钟左右的设备温度。
我要做的是从这两个表中创建一个表C。表C本质上就是表A +表B中最接近时间的温度。
如何仅在BigQuery SQL中执行此操作?温度信息不需要精确。
答案 0 :(得分:2)
下面的选项(对于BigQuery Standard SQL)假设除了表b中的temperature
之外,您还需要相应行中的所有其余值
#standardSQL
SELECT
ARRAY_AGG(
STRUCT(a_id, a.common_id, a.ts, b_id, b.ts AS b_ts, temperature)
ORDER BY ABS(TIMESTAMP_DIFF(a.ts, b.ts, SECOND))
LIMIT 1
)[SAFE_OFFSET(0)].*
FROM `project.dataset.table_a` a
LEFT JOIN `project.dataset.table_b` b
ON a.common_id = b.common_id
AND ABS(TIMESTAMP_DIFF(a.ts, b.ts, MINUTE)) < 30
GROUP BY TO_JSON_STRING(a)
我用下面生成的虚拟数据进行了烟雾测试
#standardSQL
WITH `project.dataset.table_a` AS (
SELECT CAST(1000000 * RAND() AS INT64) a_id, common_id, ts
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2018-01-01 00:00:00', '2018-01-01 23:59:59', INTERVAL 45*60 + 27 SECOND)) ts
CROSS JOIN UNNEST(GENERATE_ARRAY(1, 10)) common_id
), `project.dataset.table_b` AS (
SELECT CAST(1000000 * RAND() AS INT64) b_id, common_id, ts, CAST(60 + 40 * RAND() AS INT64) temperature
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2018-01-01 00:00:00', '2018-01-01 23:59:59', INTERVAL 1 MINUTE)) ts
CROSS JOIN UNNEST(GENERATE_ARRAY(1, 10)) common_id
)
SELECT
ARRAY_AGG(
STRUCT(a_id, a.common_id, a.ts, b_id, b.ts AS b_ts, temperature)
ORDER BY ABS(TIMESTAMP_DIFF(a.ts, b.ts, SECOND))
LIMIT 1
)[SAFE_OFFSET(0)].*
FROM `project.dataset.table_a` a
LEFT JOIN `project.dataset.table_b` b
ON a.common_id = b.common_id
AND ABS(TIMESTAMP_DIFF(a.ts, b.ts, MINUTE)) < 30
GROUP BY TO_JSON_STRING(a)
以输出中的几行为例:
Row a_id common_id ts b_id b_ts temperature
1 276623 1 2018-01-01 00:00:00 UTC 166995 2018-01-01 00:00:00 UTC 74
2 218354 1 2018-01-01 00:45:27 UTC 464901 2018-01-01 00:45:00 UTC 87
3 265634 1 2018-01-01 01:30:54 UTC 565385 2018-01-01 01:31:00 UTC 87
4 758075 1 2018-01-01 02:16:21 UTC 55894 2018-01-01 02:16:00 UTC 84
5 306355 1 2018-01-01 03:01:48 UTC 844429 2018-01-01 03:02:00 UTC 92
6 348502 1 2018-01-01 03:47:15 UTC 375859 2018-01-01 03:47:00 UTC 90
7 774920 1 2018-01-01 04:32:42 UTC 438164 2018-01-01 04:33:00 UTC 61
在这里-我将table_b设置为在'2018-01-01'一整天中每10个设备的分钟温度,在table_a中,我设置了同一天在同一10个设备上每45分钟27秒更改一次状态的时间。 a_id和b_id-只是0到999999之间的随机数
注意:ABS(TIMESTAMP_DIFF(a.ts, b.ts, MINUTE)) < 30
中的JOIN
子句控制您可以考虑查找最接近的ts的时间段(以防table_b中缺少某些IoT条目的情况
答案 1 :(得分:1)
通过WITH a AS (
SELECT 1 id, TIMESTAMP('2018-01-01 11:01:00') ts
UNION ALL SELECT 1, ('2018-01-02 10:00:00')
UNION ALL SELECT 2, ('2018-01-02 10:00:00')
)
, b AS (
SELECT 1 id, TIMESTAMP('2018-01-01 12:01:00') ts, 43 temp
UNION ALL SELECT 1, TIMESTAMP('2018-01-01 12:06:00'), 47
)
SELECT *,
(SELECT temp
FROM b
WHERE a.id=b.id
ORDER BY ABS(TIMESTAMP_DIFF(a.ts,b.ts, SECOND))
LIMIT 1) temp
FROM a
来测量最接近的时间-通过其绝对值来获取在任何方向上最接近的时间:
<ScrollView
android:layout_width="match_parent"
android:layout_height="match_parent"
android:fillViewport="true">
<LinearLayout
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical"
android:weightSum="2">