问题:问题是这个程序每个窗口多次写入Kafka(每个窗口创建2-3行或更多行,同时每个窗口创建1行)与reduce
函数一样,它只允许一个元素)。我有相同的代码用Spark编写,它完美无缺。我一直试图找到有关这个问题的信息,但我没有找到任何东西:(。此外,我一直在尝试更改某些功能的并行性和更多的东西,没有任何效果,我无法意识到问题出在哪里。< / p>
我正在测试Flink延迟。在这里你有我的问题的环境:
群集:我使用的是Flink 1.2.0和OpenJDK 8.我有3台计算机:1台JobManager,2台TaskManagers(4核,2GB RAM,每个TaskManager 4个任务槽)。
输入数据:由一个java生产者生成的行到Kafka 24分区的主题,包含两个元素:增量值和创建时间戳:
我的Java类:
filter
函数与union
一起无用,因为我只是用来检查它们的延迟。 tumbling window
,reduce
函数将所有这1和所有时间戳加在一起,这最后一个时间戳是稍后的在map
函数之间划分为1的总和,它给出了平均值,最后在最后的map
函数中,它为当前时刻添加了每个缩减行的时间戳以及此时间戳和平均时间戳。 这一行写入Kafka(2个分区的主题)。
//FLINK CONFIGURATION
final StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
//KAFKA CONSUMER CONFIGURATION
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "192.168.0.155:9092");
FlinkKafkaConsumer010<String> myConsumer = new FlinkKafkaConsumer010<>(args[0], new SimpleStringSchema(), properties);
//KAFKA PRODUCER
Properties producerConfig = new Properties();
producerConfig.setProperty("bootstrap.servers", "192.168.0.155:9092");
producerConfig.setProperty("acks", "0");
producerConfig.setProperty("linger.ms", "0");
//MAIN PROGRAM
//Read from Kafka
DataStream<String> line = env.addSource(myConsumer);
//Add 1 to each line
DataStream<Tuple2<String, Integer>> line_Num = line.map(new NumberAdder());
//Filted Odd numbers
DataStream<Tuple2<String, Integer>> line_Num_Odd = line_Num.filter(new FilterOdd());
//Filter Even numbers
DataStream<Tuple2<String, Integer>> line_Num_Even = line_Num.filter(new FilterEven());
//Join Even and Odd
DataStream<Tuple2<String, Integer>> line_Num_U = line_Num_Odd.union(line_Num_Even);
//Tumbling windows every 2 seconds
AllWindowedStream<Tuple2<String, Integer>, TimeWindow> windowedLine_Num_U = line_Num_U
.windowAll(TumblingProcessingTimeWindows.of(Time.seconds(2)));
//Reduce to one line with the sum
DataStream<Tuple2<String, Integer>> wL_Num_U_Reduced = windowedLine_Num_U.reduce(new Reducer());
//Calculate the average of the elements summed
DataStream<String> wL_Average = wL_Num_U_Reduced.map(new AverageCalculator());
//Add timestamp and calculate the difference with the average
DataStream<String> averageTS = wL_Average.map(new TimestampAdder());
//Send the result to Kafka
FlinkKafkaProducer010Configuration<String> myProducerConfig = (FlinkKafkaProducer010Configuration<String>) FlinkKafkaProducer010
.writeToKafkaWithTimestamps(averageTS, "testRes", new SimpleStringSchema(), producerConfig);
myProducerConfig.setWriteTimestampToKafka(true);
env.execute("TimestampLongKafka");
}
//Functions used in the program implementation:
public static class FilterOdd implements FilterFunction<Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
public boolean filter(Tuple2<String, Integer> line) throws Exception {
Boolean isOdd = (Long.valueOf(line._1.split(" ")[0]) % 2) != 0;
return isOdd;
}
};
public static class FilterEven implements FilterFunction<Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
public boolean filter(Tuple2<String, Integer> line) throws Exception {
Boolean isEven = (Long.valueOf(line._1.split(" ")[0]) % 2) == 0;
return isEven;
}
};
public static class NumberAdder implements MapFunction<String, Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
public Tuple2<String, Integer> map(String line) {
Tuple2<String, Integer> newLine = new Tuple2<String, Integer>(line, 1);
return newLine;
}
};
public static class Reducer implements ReduceFunction<Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> line1, Tuple2<String, Integer> line2) throws Exception {
Long sum = Long.valueOf(line1._1.split(" ")[0]) + Long.valueOf(line2._1.split(" ")[0]);
Long sumTS = Long.valueOf(line1._1.split(" ")[1]) + Long.valueOf(line2._1.split(" ")[1]);
Tuple2<String, Integer> newLine = new Tuple2<String, Integer>(String.valueOf(sum) + " " + String.valueOf(sumTS),
line1._2 + line2._2);
return newLine;
}
};
public static class AverageCalculator implements MapFunction<Tuple2<String, Integer>, String> {
private static final long serialVersionUID = 1L;
public String map(Tuple2<String, Integer> line) throws Exception {
Long average = Long.valueOf(line._1.split(" ")[1]) / line._2;
String result = String.valueOf(line._2) + " " + String.valueOf(average);
return result;
}
};
public static final class TimestampAdder implements MapFunction<String, String> {
private static final long serialVersionUID = 1L;
public String map(String line) throws Exception {
Long currentTime = System.currentTimeMillis();
String totalTime = String.valueOf(currentTime - Long.valueOf(line.split(" ")[1]));
String newLine = line.concat(" " + String.valueOf(currentTime) + " " + totalTime);
return newLine;
}
};
某些输出数据:此输出已写入2个分区的主题,生成率低于1000条记录/秒(**在这种情况下,它创建3条输出线每个窗口):
提前致谢!
答案 0 :(得分:0)
我不确切知道原因,但我可以解决问题停止Flink群集并重新启动。在一些工作执行后,它开始产生更多的输出,在皮尔x3,并且可能问题可以继续增长。我将在Jira上打开一个问题,并尽快更新。