我对Flink很新。我有这个代码,映射,分组和总和输入JSON。
这与单词计数示例非常相似。
我希望得到(vacant,1) (occupied,2)
但是,由于某种原因,我得到了(occupied,1) (vacant,1) (occupied,2)
public static void main(String[] args) throws Exception {
String s = "{\n" +
" \"Port_128\": \"occupied\",\n" +
" \"Port_129\": \"occupied\",\n" +
" \"Port_120\": \"vacant\"\n" +
"\n" +
"}";
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> in = env.fromElements(s);
SingleOutputStreamOperator<Tuple2<String, Integer>> t =
in.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String s, Collector<Tuple2<String, Integer>>
collector) throws Exception {
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(s);
node.elements().forEachRemaining(v -> {
collector.collect(new Tuple2<>(v.textValue(), 1));
});
}
}).keyBy(0).sum(1);
t.print();
env.execute();
答案 0 :(得分:1)
运行代码,我得到:
10/19/2017 11:27:38 Keyed Aggregation -> Sink: Unnamed(1/1) switched to RUNNING
(occupied,1)
(occupied,2)
(vacant,1)
10/19/2017 11:28:03 Keyed Aggregation -> Sink: Unnamed(1/1) switched to FINISHED
这与您的输出略有不同但很重要。原因是代码在接收数据时输出每个键的总和,因此首先它获得第一个占用(输出1),然后第二个(输出该键控过程的总和现在为2),然后将空置发送到另一个键控进程并输出1.所以这似乎是对我的正确输出。
修改强>
以下评论中,这里是为您提供所需输出的代码:
public static void main(String[] args) throws Exception {
String s = "{\n" +
" \"Port_128\": \"occupied\",\n" +
" \"Port_129\": \"occupied\",\n" +
" \"Port_120\": \"vacant\"\n" +
"\n" +
"}";
ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> in = env.fromElements(s);
AggregateOperator<Tuple2<String, Integer>> t =
in.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String s, Collector<Tuple2<String, Integer>>
collector) throws Exception {
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(s);
node.elements().forEachRemaining(v -> {
collector.collect(new Tuple2<>(v.textValue(), 1));
});
}
}).groupBy(0).sum(1);
t.print();
env.execute();
}