我正在努力建立一个 Apache Flink流作业,该作业可以计算非常简单的IoT数据。 它从RabbitMQ消耗(源),因此使用RMQSource。这很好用,而且对这些数据的解析也很好用。
但是,对于此解析的数据流-类型为Triplet'String,Double,Long'(SensorID,'PM2,5'的值,时间戳记)-之后应用的函数似乎很奇怪。
< br />
首先,我想在SensorID上键入Stream。
其次,我想创建一个窗口,其中包含每10或15秒基于id键入所有元素的窗口。
第三,应该在此Window上执行一个非常基本的ProcessWindowFunction,它只计算该窗口中的元素。 =>基本上,就像documentation中的示例一样。
最后,ProcessWindowFunction的输出应打印到Std.Out。
您可以看到以下相关部分。 我将JMeter与MQTT和KafkaMeter插件一起使用来发送测试数据,并通过它一次发送大约50个请求,然后看看会发生什么。
当我发送10个请求时,结果如下:
nope :(
nope :(
nope :(
nope :(
nope :(
nope :(
nope :(
nope :(
nope :(
nope :(
这对我的逻辑意味着,ProcessWindowFunction是针对每个值而不是针对一个窗口进行一次计算的。
我的问题是:
.window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
与
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
已启用->希望它能起作用,但没能。非常感谢您的帮助。
extractedDataStream
.keyBy(t -> t.getValue0()) // keyed by sensor IDs
//.timeWindow(Time.seconds(10))
.window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
.process(new DetectTooHighAirPollution())
.print();
// execute program
env.execute("MQTT Detection StreamingJob");
}
public static class DetectTooHighAirPollution
extends ProcessWindowFunction<Triplet<String, Double, Long>, String, String, TimeWindow> {
@Override
public void process(String key, Context context, Iterable<Triplet<String, Double, Long>> input, Collector<String> out) throws IOException {
long count = 0;
for (Triplet<String, Double, Long> i : input) {
count++;
}
if (count > 1) {
out.collect("yap :D!: " + count);
} else {
out.collect("nope :(");
}
}
}
}
}
为了完整起见,其余代码将执行应做的事情:
PS:我正在发送带有有效载荷作为JSON对象的MQTT消息,此刻我是“手工”解析的。
PPS:配置详细信息已删除。
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.streaming.connectors.rabbitmq.RMQSource;
import org.apache.flink.streaming.connectors.rabbitmq.common.RMQConnectionConfig;
import org.apache.flink.util.Collector;
import org.javatuples.Triplet;
import java.io.IOException;
public class StreamingJob {
public static void main(String[] args) throws Exception {
// set up the streaming execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
//env.setParallelism(1);
// Set up a configuration for the RabbitMQ Source
final RMQConnectionConfig connectionConfig = new RMQConnectionConfig.Builder()
.setHost("")
.setPort()
.setUserName("")
.setPassword("")
.setVirtualHost("")
.build();
// Initiating a Data Stream from RabbitMQ
final DataStream<String> RMQstream = env
.addSource(new RMQSource<String>(
connectionConfig, // config for the RabbitMQ connection
"", // name of the RabbitMQ queue to consume
false, // use correlation ids; can be false if only at-least-once is required
new SimpleStringSchema())) // deserialization schema to turn messages into Java objects
.setParallelism(1); // parallel Source
//Extraction of values of the Data Stream
final DataStream<Triplet<String, Double, Long>> extractedDataStream = RMQstream.map(
new RichMapFunction<String, Triplet<String, Double, Long>>() {
@Override
public Triplet<String, Double, Long> map(String s) throws Exception {
// Extract the payload of the message
String[] input = s.split(",");
// Extract the sensor ID
String sensorID = input[1];
String unformattedID = sensorID.split(":")[1];
String id = unformattedID.replaceAll(" ", "");
// Extract longitude
String sensorLONG = input[2];
String unformattedLONGTD = sensorLONG.split(":")[1];
String longtd = unformattedLONGTD.replaceAll(" ", "");
// Extract latitude
String sensorLAT = input[3];
String unformattedLATD = sensorLAT.split(":")[1];
String latd = unformattedLATD.replaceAll(" ", "");
// Extract the particulate matter
String sensorPM2 = input[6];
String unformattedPM2 = sensorPM2.split(":")[1];
String pm2String = unformattedPM2.replaceAll("[ }]+", "");
double pm2 = Double.valueOf(pm2String).doubleValue();
long ts = System.currentTimeMillis();
Triplet<String, Double, Long> sensorData = Triplet.with(id, pm2, ts);
return sensorData;
}
}
);
再次感谢,希望有人以前经历过此事,或者可以指出我正在犯的(也许是显而易见的)错误。
我能够找到解决问题的方法。我误解并曲解了“键流”的概念。对于我的用例,在应用ProcessWindowFunction时从窗口中获取单个值结果,我根本不需要'keyedStreams'。而是我不得不使用以下代码:
在我的情况下,'。keyby'实际上为每个sensorID构造了一个窗口。因此,当100个传感器(100个不同的id)在很短的时间间隔(毫秒)内发送请求时,我得到100个窗口和100个ProcessWindowFunction结果。
对于我的情况,这不是我想要的,因此我不得不使用'.WindowAll'操作来获取包含流中所有元素的单个窗口。之后,我不得不应用“ ProcessAllWindowFunction”而不是“ ProcessWindowFunction”等等: 它起作用了!:D
...
extractedDataStream
//.filter(t -> t.getValue1() > 30) //This is just a use-case specific => 71/100 sensor requests have a higher value than 30.
.windowAll(TumblingProcessingTimeWindows.of(Time.seconds(15)))
.process(new DetectTooHighAirPollution())
.print();
...
public static class DetectTooHighAirPollution
extends ProcessAllWindowFunction<Triplet<String, Double, Long>, String, TimeWindow> {
@Override
public void process(Context context, Iterable<Triplet<String, Double, Long>> input, Collector<String> out) throws IOException {
long count = 0;
for (Triplet<String, Double, Long> i : input) {
count++;
}
if (count >= 10) {
out.collect(count + " Sensors, report a too high concentration of PM2!");
} else {
out.collect("Upps something went wrong :/");
}
}
}
干杯!