我为分布式系统学习火花。我运行了这个代码并且它已经运行了。 但是我知道它在输入文件中计算了一下,但是我有一些问题,但是我不知道如何编写方法以及JavaRDD的用途
public class JavaWordCount {
public static void main(String[] args) throws Exception {
System.out.print("le programme commence");
//String inputFile = "/mapr/demo.mapr.com/TestMapr/Input/alice.txt";
String inputFile = args[0];
String outputFile = args[1];
// Create a Java Spark Context.
System.out.print("le programme cree un java spark contect");
SparkConf conf = new SparkConf().setAppName("JavaWordCount");
JavaSparkContext sc = new JavaSparkContext(conf);
// Load our input data.
System.out.print("Context créeS");
JavaRDD<String> input = sc.textFile(inputFile);
// map/split each line to multiple words
System.out.print("le programme divise le document en multiple line");
JavaRDD<String> words = input.flatMap(
new FlatMapFunction<String, String>() {
@Override
public Iterable<String> call(String x) {
return Arrays.asList(x.split(" "));
}
}
);
System.out.print("Turn the words into (word, 1) pairse");
// Turn the words into (word, 1) pairs
JavaPairRDD<String, Integer> wordOnePairs = words.mapToPair(
new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String x) {
return new Tuple2(x, 1);
}
}
);
System.out.print(" // reduce add the pairs by key to produce counts");
// reduce add the pairs by key to produce counts
JavaPairRDD<String, Integer> counts = wordOnePairs.reduceByKey(
new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer x, Integer y) {
return x + y;
}
}
);
System.out.print(" Save the word count back out to a text file, causing evaluation.");
// Save the word count back out to a text file, causing evaluation.
counts.saveAsTextFile(outputFile);
System.out.println(counts.collect());
sc.close();
}
}
答案 0 :(得分:-1)
正如PinoSan所提到的,这个问题可能过于通用了,您应该能够在任何Spark入门或教程中找到答案。
让我指出一些有趣的内容:
免责声明:我正在为MapR工作,这就是我将MapR网站上的资源放在Spark上的原因