我正在使用Flink DataStream API,那里有可用的机架&我想按机架ID计算温度组的平均值"我的窗口持续时间是40秒&我的窗口每10秒钟滑动一次......以下是我的代码,我每隔10秒钟计算一次温度的总和,但现在我想计算平均温度::
static Properties properties=new Properties();
public static Properties getProperties()
{
properties.setProperty("bootstrap.servers", "54.164.200.104:9092");
properties.setProperty("zookeeper.connect", "54.164.200.104:2181");
//properties.setProperty("deserializer.class", "kafka.serializer.StringEncoder");
//properties.setProperty("group.id", "akshay");
properties.setProperty("auto.offset.reset", "earliest");
return properties;
}
@SuppressWarnings("rawtypes")
public static void main(String[] args) throws Exception
{
StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties props=Program.getProperties();
DataStream<TemperatureEvent> dstream=env.addSource(new FlinkKafkaConsumer09<TemperatureEvent>("TemperatureEvent", new TemperatureEventSchema(), props)).assignTimestampsAndWatermarks(new IngestionTimeExtractor<>());
DataStream<TemperatureEvent> ds1=dstream.keyBy("rackId").timeWindow(Time.seconds(40), Time.seconds(10)).sum("temperature");
env.execute("Temperature Consumer");
}
如何计算上述例子的平均温度?
答案 0 :(得分:4)
据我所知,你需要自己编写普通函数。 你可以在这里找到一个例子:
在您的情况下,您可能会替换
.sum("temperature");
.apply(new Avg());
并实现Avg类:
public class Avg implements WindowFunction<TemperatureEvent, TemperatureEvent, Long, org.apache.flink.streaming.api.windowing.windows.Window> {
@Override
public void apply(Long key, Window window, Iterable<TemperatureEvent> values, Collector<TemperatureEvent> out) {
long sum = 0L;
int count = 0;
for (TemperatureEvent value : values) {
sum += value.getTemperature();
count ++;
}
TemperatureEvent result = values.iterator().next();
result.setTemperature(sum / count);
out.collect(result);
}
}
注意: 如果有可能在空窗口上调用您的函数(例如使用自定义触发器),则需要在访问elements.head之前进行检查。