对于输出,如何连接使用mapWithState获得的结果和输入字符串?

时间:2016-11-25 14:22:06

标签: apache-spark spark-streaming

我完全理解如何通过示例JavaStatefulNetworkWordCount使用mapWithMap。但是,我有一个问题。想象一下,我有这条json线

{"device":"dv1","parameter1":"vv1","parameter2":"vv2"} 

使用JavaStatefulNetworkWordCount +我的代码来解析json,我可以增加dv1出现的时间。显示的结果例如是

(dv1, 51). 

现在,我想将结果包含在json行中以获得输出:

{"device":"dv1","parameter1":"vv1","parameter2":"vv2","incre‌​ment":51}.

你有想法实现这个结果吗?我不知道如何使用之前的代码制作它。

到目前为止我的代码是:

/**
 * Counts words
 * To run this on your local machine, you need to first run a Netcat server
 * `$ nc -lk 9999`
 * and then run the example
 * `$ bin/run-example
 * org.apache.spark.examples.streaming.JavaStatefulNetworkWordCount localhost 9999`
 */
public class JavaStatefulNetworkWordCount {
  private static final Pattern SPACE = Pattern.compile(" ");

  public static void main(String[] args) throws Exception {
    if (args.length < 2) {
      System.err.println("Usage: JavaStatefulNetworkWordCount <hostname> <port>");
      System.exit(1);
    }

    // Create the context with a 1 second batch size
    SparkConf sparkConf = new SparkConf().setAppName("test1").setMaster("local[*]");
    JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
    ssc.checkpoint(".");

    // Initial state RDD input to mapWithState
    List<Tuple2<String, Integer>> tuples =
        Arrays.asList();
    JavaPairRDD<String, Integer> initialRDD = ssc.sparkContext().parallelizePairs(tuples);

    JavaReceiverInputDStream<String> lines = ssc.socketTextStream(
            args[0], Integer.parseInt(args[1]), StorageLevels.MEMORY_AND_DISK_SER_2);

    JavaDStream<String> words = lines.map(x -> {
        String deviceName = "";
        //extract from x, the device name (for instance dv1)        
        return deviceName;
    });

    JavaPairDStream<String, Integer> wordsDstream = words.mapToPair(
        s -> new Tuple2<>(s, 1));

    // Update the cumulative count function
    Function3<String, Optional<Integer>, State<Integer>, Tuple2<String, Integer>> mappingFunc =
        (word, one, state) -> {
        int sum = one.orElse(0) + (state.exists() ? state.get() : 0);
        Tuple2<String, Integer> output = new Tuple2<>(word, sum);
        state.update(sum);
        return output;
        };

    // DStream made of get cumulative counts that get updated in every batch
    JavaMapWithStateDStream<String, Integer, Integer, Tuple2<String, Integer>> stateDstream =
        wordsDstream.mapWithState(StateSpec.function(mappingFunc).initialState(initialRDD));

    stateDstream.print();
    ssc.start();
    ssc.awaitTermination();
  }
}

提前谢谢你,

Ĵ

0 个答案:

没有答案