Spark集群上的DStrream [String] .foreachrdd

时间:2015-10-06 06:34:30

标签: scala apache-spark bigdata apache-kafka spark-streaming

我是Spark的新手,有人可以帮帮我吗?

def streamStart() {
val sparkConf = new SparkConf().setAppName("kafkaStreamingNew!!").setMaster("spark://husnain:7077").setJars(Array("/home/husnain/Downloads/ScalaWorkspace/KafkaStreaming/target/KafkaStreaming-1.1.0-jar-with-dependencies.jar")) //,"/home/husnain/.m2/repository/org/apache/spark/spark-streaming-kafka_2.10/1.4.1/spark-streaming-kafka_2.10-1.4.1.jar" , "/home/husnain/.m2/repository/org/apache/spark/spark-streaming_2.10/1.4.1/spark-streaming_2.10-1.4.1.jar" ,"/home/husnain/.m2/repository/org/apache/spark/spark-core_2.10/1.4.1/spark-core_2.10-1.4.1.jar" ))
val ssc = new StreamingContext(sparkConf, Seconds(1))

val topics = "test";
ssc.checkpoint("checkpoint")
val lines = KafkaUtils.createStream(ssc, "localhost:2181", "spark", Map("test" -> 1)).map(_._2)
lines.print()
println("*****************************************************************************")
lines.foreachRDD(
  iter => iter.foreach(
    x => println(x + "\n***-------------------------------------------------------***\n")))
println("-----------------------------------------------------------------------------")
ssc.start()
ssc.awaitTermination()

在Spark独立群集上,该代码不起作用,但在本地[*]上,它可以正常工作:

lines.foreachRDD(
  iter => iter.foreach(
    x => println(x + "\n***-------------------------------------------------------***\n")
    )
   )

1 个答案:

答案 0 :(得分:0)

我认为它被称为“正常工作”的是你在控制台上看到println

当您向集群提交相同的代码时,控制台的println会在每个执行程序上本地发生,因此如果其他所有内容都正常工作,则缺少输出仅仅是分布式执行的结果。

查看println s

的执行程序输出