在火花流模式

时间:2017-05-31 09:57:43

标签: java file spark-streaming

我想从文件夹中读取一些文本文件并对它们进行一些操作。我应该提一下,我在Mac OS上使用intellij的想法。这是我在测试中所做的:

1-我在程序运行时复制并粘贴了文件

2-我从另一个文件夹中移动了它们

3-我重命名了

4-我为每个测试添加了新文件

5-我用Scala和Java进行了测试

虽然我每次都按上述条件进行测试,但从文件夹中无法读取任何内容。我把我的java代码放在这里:

public static void main(String[] args) {

        String outputPathPrefix;
        String inputFolder;

        inputFolder ="/Users/saeedtkh/Desktop/SparkStreamingWordCountFolder/inputfolder";
        outputPathPrefix = "/Users/saeedtkh/Desktop/SparkStreamingWordCountFolder/output/";

        // Create a configuration object and set the name of the application
        SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("Spark Streaming word count");

        // Create a Spark Streaming Context object
        JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(10));

        // Create a DStream reading the content of the input folder
        JavaDStream<String> lines = jssc.textFileStream(inputFolder);

        // Apply the "standard" trasformations to perform the word count task
        // However, the "returned" RDDs are DStream/PairDStream RDDs
        JavaDStream<String> words = lines.flatMap(new Split());

        JavaPairDStream<String, Integer> wordsOnes = words.mapToPair(new WordOne());

        JavaPairDStream<String, Integer> wordsCounts = wordsOnes.reduceByKey(new Sum());

        wordsCounts.print();

        wordsCounts.dstream().saveAsTextFiles(outputPathPrefix, "");

        // Start the computation
        jssc.start();

        jssc.awaitTerminationOrTimeout(120000);

        jssc.close();

    }
你可以帮我找到问题吗?我错过了什么吗?

0 个答案:

没有答案