时间加入Hive查询(事件在时间上接近)

时间:2017-07-21 15:06:11

标签: sql hadoop hive

我需要一个我很难搞清楚的配置单元查询。

我的时间序列如下:

time                          source    word1   word2    ...etc
2012-02-01 23:43:16.9988243    0001      2B3B    FAF0
2012-02-01 23:43:16.9993561    0002      2326    ABAA
2012-02-01 23:43:16.9998879    0002      2327    ABAA

我需要一个查询,如果一个source中的记录满足特定条件,除了该记录之外,它还应该及时返回一个或多个记录 从第二个source履行一系列不同的条件。

到目前为止,我的尝试看起来像这样:

SELECT time
    FROM messages C
    JOIN messages D on
       D.time
           BETWEEN C.time - INTERVAL '0.001' SECOND
           AND     C.time + INTERVAL '0.001' SECOND

    WHERE C.source = '0001'
    AND   D.Source = '0002'
    AND   C.word1 = '2B3B'
    AND   D.word2 = 'ABAA'

哪个应该返回上面的示例数据中的第一个和第二个记录(它不应该返回第三个,因为时间远远超过.001秒)。

但查询不起作用。错误消息是

FAILED: SemanticException '0.001' encountered with 0 children

1 个答案:

答案 0 :(得分:3)

这将是一个天真的解决方案:

select  *

from                    messages c
        cross join      messages m 

where   m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'

;
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
|            time            | source | word1 | word2 |            time            | source | word1 | word2 |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
| 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |   0002 |  2326 | ABAA  |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+

这是性能良好的解决方案

select  *

from                    messages c

        join            messages m

        on              floor (cast(c.time as decimal(37,7)) / (2 * 0.001))   =
                        floor (cast(m.time as decimal(37,7)) / (2 * 0.001))

where   m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'

union all

select  *

from                    messages c

        join            messages m

        on              floor ((cast(c.time as decimal(37,7)) + 0.001) / (2 * 0.001))   =
                        floor ((cast(m.time as decimal(37,7)) + 0.001) / (2 * 0.001))

where   floor (cast(c.time as decimal(37,7)) / (2 * 0.001))     <>
        floor (cast(m.time as decimal(37,7)) / (2 * 0.001))

    and m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
|            time            | source | word1 | word2 |           _col4            | _col5 | _col6 | _col7 |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
| 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |  0002 |  2326 | ABAA  |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+

插图

事件A和B将被UNION ALL的上半部分抓住 事件B和C将被UNION ALL的下半部分捕获。

    0        0.002    0.004    0.006    0.008    0.01      
    |        |        |        |        |        |
-------------------------------------------------------
                      |        |
                      |        |
                          A  B  C
                           |        |
                           |        |
-------------------------------------------------------
         |        |        |        |        |                
         0.001    0.003    0.005    0.007    0.009