我设计了一个Nifi流,以将以Avro格式序列化的JSON事件推送到Kafka主题,然后尝试在Spark结构化流媒体中使用它。
虽然Kafka部件工作正常,但Spark结构化流媒体无法读取Avro事件。它失败并显示以下错误。
[Stage 0:> (0 + 1) / 1]2019-07-19 16:56:57 ERROR Utils:91 - Aborting task
org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -62
at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
火花代码
import org.apache.spark.sql.types.{ StructField, StructType }
import org.apache.spark.sql.types.{ DecimalType, LongType, ByteType, StringType }
import org.apache.spark.sql.types.DataType._
import scala.collection.Seq
import org.apache.spark._
import spark.implicits._
import org.apache.spark.streaming._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql._
import org.apache.spark.sql.avro._
import java.nio.file.{Files, Path, Paths}
val spark = SparkSession.builder.appName("Spark-Kafka-Integration").master("local").getOrCreate()
val jsonFormatSchema = new String(Files.readAllBytes(Paths.get("schema.avsc")))
val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "host:port").option("subscribe", "topic_name").load()
val df1 = df.select(from_avro(col("value"),jsonFormatSchema).as("data")).select("data.*")
df1.writeStream.format("console").option("truncate","false").start()
))
Spark中使用的架构
{
"type": "record",
"name": "kafka_demo_new",
"fields": [
{
"name": "host",
"type": "string"
},
{
"name": "event",
"type": "string"
},
{
"name": "connectiontype",
"type": "string"
},
{
"name": "user",
"type": "string"
},
{
"name": "eventtimestamp",
"type": "string"
}
]
}
在Kafka中采样主题数据
{"host":"localhost","event":"Qradar_Demo","connectiontype":"tcp/ip","user":"user","eventtimestamp":"2018-05-24 23:15:07"}
下面是版本信息
HDP - 3.1.0
Kafka - 2.0.0
Spark - 2.4.0
感谢您的帮助。
答案 0 :(得分:0)
有一个类似的问题,发现Kafka / KSQL的AVRO版本不同,导致其他组件抱怨。
这可能也是您的情况: 看看:https://github.com/confluentinc/ksql/issues/1742