Spark Job未将消息发布到Kafka主题

时间:2020-06-10 02:28:13

标签: scala apache-spark apache-kafka spark-streaming kafka-producer-api

我写了spark作业来读取一个文件,将数据转换为json并将数据发布到Kafka: 我尝试了所有选项 1.把thread.sleep 2.changeing linger.ms小于thread.sleep。但是什么都没有解决..只是不向kafKa发布任何东西。我已经尝试了producer.flush()/ producer.close()。日志中没有错误。但是它仍然只是不发布任何东西。 如果我写一个普通的独立生产者将消息发布到相同的kafka主题,它将没有任何问题。 因此,Kafka本身就没有问题。 4.我可以看到我的send方法正在被log调用。并且在结束时也被调用了。没有错误。 请帮忙!!!!!!!!!!!!

这是我的项目重要文件:

build.sbt:

name:=“ SparkStreamingExample”

// version:=“ 0.1”

A2

MySparkKafkaProducer.scala

A

AlibababaMainJob.scala

scalaVersion := "2.11.8"
val spark="2.3.1"
val kafka="0.10.1"
// https://mvnrepository.com/artifact/org.apache.kafka/kafka

dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.9.6"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.6"
dependencyOverrides += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.9.6"
// https://mvnrepository.com/artifact/com.fasterxml.jackson.dataformat/jackson-dataformat-cbor
dependencyOverrides += "com.fasterxml.jackson.dataformat" % "jackson-dataformat-cbor" % "2.9.6"
libraryDependencies += "org.apache.kafka" % "kafka_2.11" % "2.0.0"
// https://mvnrepository.com/artifact/org.apache.kafka/kafka
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % spark
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.1"
libraryDependencies +="com.typesafe.play" %"play-json_2.11" % "2.6.6" exclude("com.fasterxml.jackson.core","jackson-databind")
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.1"
libraryDependencies +="com.typesafe" % "config" %"1.3.2"

AlibabaCoreJob.scala

import java.util.Properties
import java.util.concurrent.Future

import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord, RecordMetadata}

class MySparkKafkaProducer(createProducer: () => KafkaProducer[String, String]) extends Serializable {

  /* This is the key idea that allows us to work around running into
     NotSerializableExceptions. */
  @transient lazy val producer = createProducer()

  def send(topic: String, key: String, value: String): Future[RecordMetadata] = {
    println("inside send method")
    producer.send(new ProducerRecord(topic, key, value))


  }

  def send(topic: String, value: String)= {
   // println("inside send method")
    producer.send(new ProducerRecord(topic, value))

  }

 // producer.send(new ProducerRecord[K, V](topic, value))

}

object MySparkKafkaProducer extends Serializable {

  import scala.collection.JavaConversions._

  def apply(config:Properties):MySparkKafkaProducer={

    val f = () =>{

      val producer =new KafkaProducer[String,String](config)
      sys.addShutdownHook({
        println("calling Closeeeeeeeeeee")
        producer.flush()
        producer.close
      })
      producer

    }
    new MySparkKafkaProducer(f)
  }



}

1 个答案:

答案 0 :(得分:0)

在您的项目中,添加以下依赖项:Spark-Sql,Spark-Core,Spark-Streaming,Spa-Streaming-Kafka-0-10。 您可以读取数据框中的给定文件,执行所需的任何类型的处理,然后在处理完成后,可以按照以下步骤将数据帧写入kafka

resultDF.writeStream.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("topic", "topic1")

请进一步参考文档here

请注意,我假设您的处理结果将存储在名为 resultDF

的数据框中