停止火花流

时间:2016-02-10 12:17:03

标签: java apache-spark spark-streaming

我希望在处理文件中的100条记录后停止spark中的java流式上下文。问题是流式启动时if语句中的代码没有执行。以下代码将解释我的想法:

    public static void main(String[] args) throws Exception {

        int ff = testSparkStreaming();

        System.out.println("wqwqwq");
        System.out.println(ff);

    }


    public static int testSparkStreaming() throws IOException, InterruptedException {

        int numberInst = 0
        String savePath = "Path to Model";
        final NaiveBayesModel savedModel = NaiveBayesModel.load(jssc.sparkContext().sc(), savePath);

        BufferedReader br = new BufferedReader(new FileReader("C://testStream//copy.csv"));
        Queue<JavaRDD<String>> rddQueue = new LinkedList<JavaRDD<String>>();
        List<String> list = Lists.newArrayList();
        String line = "";
        while ((line = br.readLine()) != null) {
            list.add(line);
        }
        br.close();

        rddQueue.add(jssc.sparkContext().parallelize(list));
        numberInst+= list.size();
        JavaDStream<String> dataStream = jssc.queueStream(rddQueue);
        dataStream.print();

        if (numberInst == 100){
             System.out.println("should stop");
             jssc.wait();
        }
        jssc.start();
        jssc.awaitTermination();

        return numberInst;

}

我的问题是如何在numberInst == 100时停止流式处理并将执行移至main方法以运行以下语句。

P.S:在前面的代码中,If语句未执行:

        if (numberInst == 100){
             System.out.println("should stop");
             jssc.wait();
        }

2 个答案:

答案 0 :(得分:2)

你可以试试这个:

    jssc.start();

    while (numberInst < 100){
        jssc.awaitTerminationOrTimeout(1000); // 1 second polling time, you can change it as per your usecase
    }

    jssc.stop();

答案 1 :(得分:0)

你试过像线程一样停止这个,我的意思是中断。