apache-spark-sql - 无法在群集中编写spark流程序

我的代码的最后一部分试图在集群中编写。它在我的本地机器中以本地模式工作正常，但不会写入纱线群集。

 val outputPath = "/tmp/myStream/"
outputDStream.foreachRDD((rdd : RDD[Test], time : org.apache.spark.streaming.Time) => {
  val spark = SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate()
  import spark.implicits._
  val df = rdd.toDF

  df.write
    .mode(SaveMode.Append)
    .format("json")
    .save(s"${outputPath}${time.milliseconds}")

})

这是我以前运行它的命令。

nohup spark-submit  --master yarn --deploy-mode cluster --conf spark.driver.allowMultipleContexts=true --conf spark.executor.cores=10 --conf spark.executor.memory=9g --conf spark.streaming.receiver.writeAheadLog.enable=true --class com.test.mainClass /tmp/aws-test-1.0-SNAPSHOT-jar-with-dependencies.jar

我是否需要将其保存为RDD并收集以将其写入代码运行位置的hdfs中？

无法在群集中编写spark流程序

0 个答案: