Question

我为分布式系统学习火花。我运行了这个代码并且它已经运行了。但是我知道它在输入文件中计算了一下，但是我有一些问题，但是我不知道如何编写方法以及JavaRDD的用途

public class JavaWordCount {

public static void main(String[] args) throws Exception {

    System.out.print("le programme commence");
    //String inputFile = "/mapr/demo.mapr.com/TestMapr/Input/alice.txt";
    String inputFile = args[0];
    String outputFile = args[1];
    // Create a Java Spark Context.
    System.out.print("le programme cree un java spark contect");

    SparkConf conf = new SparkConf().setAppName("JavaWordCount");
    JavaSparkContext sc = new JavaSparkContext(conf);
    // Load our input data.
    System.out.print("Context créeS");

    JavaRDD<String> input = sc.textFile(inputFile);



    // map/split each line to multiple words

    System.out.print("le programme divise le document en multiple line");

    JavaRDD<String> words = input.flatMap(
            new FlatMapFunction<String, String>() {
                @Override
                public Iterable<String> call(String x) {
                    return Arrays.asList(x.split(" "));
                }
            }
    );
    System.out.print("Turn the words into (word, 1) pairse");

    // Turn the words into (word, 1) pairs
    JavaPairRDD<String, Integer> wordOnePairs = words.mapToPair(
            new PairFunction<String, String, Integer>() {
                @Override
                public Tuple2<String, Integer> call(String x) {
                    return new Tuple2(x, 1);
                }
            }
    );

    System.out.print("        // reduce add the pairs by key to produce counts");

    // reduce add the pairs by key to produce counts
    JavaPairRDD<String, Integer> counts = wordOnePairs.reduceByKey(
            new Function2<Integer, Integer, Integer>() {
                @Override
                public Integer call(Integer x, Integer y) {
                    return x + y;
                }
            }
    );


    System.out.print(" Save the word count back out to a text file, causing evaluation.");

    // Save the word count back out to a text file, causing evaluation.
    counts.saveAsTextFile(outputFile);
    System.out.println(counts.collect());
    sc.close();
}

}

Answer 1

正如PinoSan所提到的，这个问题可能过于通用了，您应该能够在任何Spark入门或教程中找到答案。

让我指出一些有趣的内容：

免责声明：我正在为MapR工作，这就是我将MapR网站上的资源放在Spark上的原因

Begenner at spark大数据编程（火花代码）

1 个答案: