使用foreachPartition spark流时没有收到任何消息

时间:2016-05-20 19:09:00

标签: apache-kafka spark-streaming

我使用Spark Streaming从Kafka中撤出。当我在RDD上使用foreachPartition时,我从未收到任何消息。如果我使用foreach从RDD读取消息,它可以正常工作。但是我需要使用分区功能,所以我可以在每个执行器上有一个套接字连接。

这是连接到spark和创建流的代码

val kafkaParams = Map(
    "zookeeper.connect" -> zooKeepers,
    "group.id" -> ("metric-group"),
    "zookeeper.connection.timeout.ms" -> "5000")
  val inputTopic = "threatflow"

  val conf = new SparkConf().setAppName(applicationTitle).set("spark.eventLog.overwrite", "true")

  val ssc = new StreamingContext(conf, Seconds(5))

  val streams = (1 to numberOfStreams) map { _ =>
    KafkaUtils.createStream[String,String,StringDecoder,StringDecoder](ssc, kafkaParams, Map(inputTopic -> 1), StorageLevel.MEMORY_ONLY_SER)
  }
  val kafkaStream = ssc.union(streams)

  kafkaStream.foreachRDD { (rdd, time) =>
    calcVictimsProcess(process, rdd, time.milliseconds)
  }

  ssc.start()
  ssc.awaitTermination()

以下是我的代码,它尝试使用foreachPartition而不是foreach

来处理消息
val threats = rdd.map(message => gson.fromJson(message._2.substring(1, message._2.length()), classOf[ThreatflowMessage]))

  threats.flatMap(mapSrcVictim).reduceByKey((a,b) => a + b).foreachPartition{ partition =>
    val socket = new Socket(InetAddress.getByName("localhost"),4242)
    val writer = new BufferedOutputStream(socket.getOutputStream)
    partition.foreach{ value =>
      val parts = value._1.split("-")
      val put = "put %s %d %d type=%s address=%s unique=%s\n".format("metric", bucket, value._2, parts(0),parts(1),unique)
      Thread.sleep(10000)
    }
    writer.flush()
    socket.close()
  }

简单地将其转换为foreach,正如我所说的那样,但是这不会起作用,因为我需要为每个执行器创建套接字

0 个答案:

没有答案