Flink keyBy分组问题

时间:2017-10-19 10:08:59

标签: apache-flink flink-streaming

我对Flink很新。我有这个代码,映射,分组和总和输入JSON。

这与单词计数示例非常相似。

我希望得到(vacant,1) (occupied,2)

但是,由于某种原因,我得到了(occupied,1) (vacant,1) (occupied,2)

  public static void main(String[] args) throws Exception {
        String s = "{\n" +
                "    \"Port_128\": \"occupied\",\n" +
                "    \"Port_129\": \"occupied\",\n" +
                "    \"Port_120\": \"vacant\"\n" +
                "\n" +
                "}";
        StreamExecutionEnvironment env = 
        StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<String> in = env.fromElements(s);
        SingleOutputStreamOperator<Tuple2<String, Integer>> t = 
        in.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String s, Collector<Tuple2<String, Integer>> 
            collector) throws Exception {
                ObjectMapper mapper = new ObjectMapper();
                JsonNode node = mapper.readTree(s);
                node.elements().forEachRemaining(v -> {
                    collector.collect(new Tuple2<>(v.textValue(), 1));
                });

            }
        }).keyBy(0).sum(1);

        t.print();
        env.execute();

1 个答案:

答案 0 :(得分:1)

运行代码,我得到:

10/19/2017 11:27:38 Keyed Aggregation -> Sink: Unnamed(1/1) switched to RUNNING 
(occupied,1)
(occupied,2)
(vacant,1)
10/19/2017 11:28:03 Keyed Aggregation -> Sink: Unnamed(1/1) switched to FINISHED 

这与您的输出略有不同但很重要。原因是代码在接收数据时输出每个键的总和,因此首先它获得第一个占用(输出1),然后第二个(输出该键控过程的总和现在为2),然后将空置发送到另一个键控进程并输出1.所以这似乎是对我的正确输出。

修改

以下评论中,这里是为您提供所需输出的代码:

public static void main(String[] args) throws Exception {
  String s = "{\n" +
      "    \"Port_128\": \"occupied\",\n" +
      "    \"Port_129\": \"occupied\",\n" +
      "    \"Port_120\": \"vacant\"\n" +
      "\n" +
      "}";
  ExecutionEnvironment env =
      ExecutionEnvironment.getExecutionEnvironment();
  DataSet<String> in = env.fromElements(s);
  AggregateOperator<Tuple2<String, Integer>> t =
      in.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
        @Override
        public void flatMap(String s, Collector<Tuple2<String, Integer>>
            collector) throws Exception {
          ObjectMapper mapper = new ObjectMapper();
          JsonNode node = mapper.readTree(s);
          node.elements().forEachRemaining(v -> {
            collector.collect(new Tuple2<>(v.textValue(), 1));
          });

        }
      }).groupBy(0).sum(1);

  t.print();
  env.execute();
}