Question

我已经在本地设置了spark 2.2并使用Scala

火花会话配置在下面

val sparkSession = SparkSession
  .builder()
  .appName("My application")
  .config("es.nodes", "localhost:9200")
  .config("es.index.auto.create", true)
  .config("spark.streaming.backpressure.initialRate", "1")
  .config("spark.streaming.kafka.maxRatePerPartition", "7")
  .master("local[2]")
  .enableHiveSupport()
  .getOrCreate()

我正在本地计算机上运行spark

当我这样做

  kafkaStream.foreachRDD(rdd => {
   calledFunction(rdd)
 })


def calledFunction(rdd: RDD[ConsumerRecord[String, String]]): Unit ={

 rdd.foreach(r=>{
 print("hello")})
}

对于我的本地计算机“ hello”上的上述代码，无法打印，但所有作业都已排队。

如果我将代码更改为

kafkaStream.foreachRDD(rdd => { rdd.foreach(r=>{ print("hello")}) })

然后在控制台上打印“ hello”。

您能在这里帮我什么问题吗？

Answer 1

在使用spark 1.6时，其在控制台上的打印问候是。供参考的是示例代码

val message = KafkaUtils.createStream[Array[Byte], String, DefaultDecoder, StringDecoder](
  ssc,
  kafkaConf,
  Map("test" ->1),
  StorageLevel.MEMORY_ONLY
)
val lines = message.map(_._2)
lines.foreachRDD(rdd => {calledFunction(rdd)})


def calledFunction(rdd: RDD[String]): Unit ={
  rdd.foreach(r=>{
    print("hello")})
}

希望这会有所帮助。由于依赖项不匹配，目前我无法使用spark 2.0重新生成相同的问题。

如果我局部执行了函数，则在foreachRdd中使用kafka代码进行Spart流传输不会执行

1 个答案: