spark stream kafka:提取主题分区数据的未知错误

时间:2018-09-28 07:51:55

标签: apache-spark apache-kafka apache-spark-sql spark-structured-streaming

我正在尝试使用结构化流API与Spark中的Kafka集成从Spark集群中读取Kafka主题

myobj1.map(items =>
{
if(items.MNGR_NAME) {
return items.MNGR_NAME;
}else {
//do want you want.
}    
})

创建Kafka流

val sparkSession = SparkSession.builder()
  .master("local[*]")
  .appName("some-app")
  .getOrCreate()

使用命令运行它

import sparkSession.implicits._

val dataFrame = sparkSession
  .readStream
  .format("kafka")
  .option("subscribepattern", "preprod-*")
  .option("kafka.bootstrap.servers", "<brokerUrl>:9094")
  .option("kafka.ssl.protocol", "TLS")
  .option("kafka.security.protocol", "SSL")
  .option("kafka.ssl.key.password", secretPassword)
  .option("kafka.ssl.keystore.location", "/tmp/xyz.jks")
  .option("kafka.ssl.keystore.password", secretPassword)
  .option("kafka.ssl.truststore.location", "/abc.jks")
  .option("kafka.ssl.truststore.password", secretPassword)
  .load()
  .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
  .as[(String, String)]
  .writeStream
  .format("console")
  .start()
  .awaitTermination()

出现以下错误

/usr/local/spark/bin/spark-submit 
--packages "org.apache.spark:spark-streaming-kafka-0-10_2.11:2.3.1,org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1"
myjar.jar

1 个答案:

答案 0 :(得分:0)

您的Kafka经纪人版本是什么?以及如何生成这些消息?

如果这些消息带有标题(https://issues.apache.org/jira/browse/KAFKA-4208),则您将需要使用Kafka 0.11+来使用它们,因为旧的Kafka客户端无法读取此类消息。如果是这样,您可以使用以下命令:

/usr/local/spark/bin/spark-submit --packages "org.apache.kafka:kafka-clients:0.11.0.3,org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1"
myjar.jar