我配置了我的群集(1个主/ 9个从属)。
我的问题是,当我提交一个应用程序(通过spark-submit
和deploy-mode cluster
的单词时),即使数据很少,驱动程序也不会停止。
我提交了这样的应用程序:
./spark-submit \
--class wordCount \
--master spark://master:6066 --deploy-mode cluster --supervise \
--executor-cores 1 --total-executor-cores 3 --executor-memory 1g \
hdfs://master:9000/user/exemple/word3.jar \
hdfs://master:9000/user/exemple/texte.txt
hdfs://master:9000/user/exemple/result 2
这是我的计划:
import org.apache.spark.SparkContext import
org.apache.spark.SparkContext._ import org.apache.spark.SparkConf
object SparkWordCount { def main(args: Array[String]) {
// create Spark context with Spark configuration
val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))
// get threshold
val threshold = args(1).toInt
// read in text file and split each document into words
val tokenized = sc.textFile(args(0)).flatMap(_.split(" "))
// count the occurrence of each word
val wordCounts = tokenized.map((_, 1)).reduceByKey(_ + _)
// filter out words with fewer than threshold occurrences
val filtered = wordCounts.filter(_._2 >= threshold)
// count characters
val charCounts = filtered.flatMap(_._1.toCharArray).map((_, 1)).reduceByKey(_ + _)
System.out.println(charCounts.collect().mkString(", ")) } }