来自不同来源的多个流之间的完全连接

时间:2019-01-07 12:12:13

标签: hazelcast-jet

我正在使用hazelcast jet 0.6.1进行实时分析。有多个流(主要来自远程日志)来自不同的来源。

我想知道,是否在多个流之间支持完全联接。

如果是,请您指出一些链接/示例,以实现多个流之间的完全连接。

2 个答案:

答案 0 :(得分:0)

我认为您需要详细说明您要尝试做的事情。从理论上讲,流是无限的,因此“完全连接”一词的含义必须不同于数据库中的含义。

Jet中有几种类型的联接。正如Can所说的那样,这里有一个合并运算符,但是您可能会想更多地考虑窗口式联接的位置,在此期间限制联接的时间。

合并Steams在这里: https://docs.hazelcast.org/docs/jet/0.7.2/manual/#merge

窗口概念在这里: https://docs.hazelcast.org/docs/jet/0.7.2/manual/#unbounded-stream-processing

答案 1 :(得分:0)

*This is in response to the comment from the first answer, it's to large for another comment and I thought the first answer is still relevant

Is this the same data and data type, just from different nodes? Like app servers for a microservices architecture? It seems to me that you have a few options here that really come down to preferred overall architecture, especially about how you want to transport the events. A couple thoughts:

  1. You can simply merge streams from different data sources if that fits the use case:

See: https://docs.hazelcast.org/docs/jet/0.7.2/manual/#merge

  1. If this is homogenous data, just distributed across app servers, if might be a case where you use the Hazelcast client on each app server to put events into an IMap (which is shared by all the app servers) with an Event Journal on a Hazelcast cluster. Then Jet just receives all the events from the Event Journal.

See: https://docs.hazelcast.org/docs/latest/manual/html-single/#event-journal

  1. If you have Kafka available, perhaps you create a topic for the events from the servers and Jet receives the events from Kafka. Either way they are already merged when Jet gets them, so they are processed as one stream.

See: https://docs.hazelcast.org/docs/jet/0.7.2/manual/#kafka