我正在尝试将Kafka消息值分解为case类实例。 (我将消息放在另一侧。)
此代码:
import ss.implicits._
import org.apache.spark.sql.functions._
val enc: Encoder[TextRecord] = Encoders.product[TextRecord]
ss.udf.register("deserialize", (bytes: Array[Byte]) => {
DefSer.deserialize(bytes).asInstanceOf[TextRecord] }
)
val inputStream = ss.readStream
.format("kafka")
.option("kafka.bootstrap.servers", conf.getString("bootstrap.servers"))
.option("subscribe", topic)
.option("startingOffsets", "earliest")
.load()
inputStream.printSchema
val records = inputStream
.selectExpr(s"deserialize(value) AS record")
records.printSchema
val rec2 = records.as(enc)
rec2.printSchema
产生以下输出:
root
|-- key: binary (nullable = true)
|-- value: binary (nullable = true)
|-- topic: string (nullable = true)
|-- partition: integer (nullable = true)
|-- offset: long (nullable = true)
|-- timestamp: timestamp (nullable = true)
|-- timestampType: integer (nullable = true)
root
|-- record: struct (nullable = true)
| |-- eventTime: timestamp (nullable = true)
| |-- lineLength: integer (nullable = false)
| |-- windDirection: float (nullable = false)
| |-- windSpeed: float (nullable = false)
| |-- gustSpeed: float (nullable = false)
| |-- waveHeight: float (nullable = false)
| |-- dominantWavePeriod: float (nullable = false)
| |-- averageWavePeriod: float (nullable = false)
| |-- mWaveDirection: float (nullable = false)
| |-- seaLevelPressure: float (nullable = false)
| |-- airTemp: float (nullable = false)
| |-- waterSurfaceTemp: float (nullable = false)
| |-- dewPointTemp: float (nullable = false)
| |-- visibility: float (nullable = false)
| |-- pressureTendency: float (nullable = false)
| |-- tide: float (nullable = false)
当我到达水槽
val debugOut = rec2.writeStream
.format("console")
.option("truncate", "false")
.start()
debugOut.awaitTermination()
催化剂抱怨:
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`eventTime`' given input columns: [record];
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
我通过调用rec2.map(r=>r.getAs[TextRecord](0))
,explode("record")
等尝试了许多操作来“拉起TextRecord”,但碰到了ClassCastExceptions
。
答案 0 :(得分:1)
最简单的方法是使用import ss.implicits._
val inputStream = ss.readStream
.format("kafka")
.option("kafka.bootstrap.servers", conf.getString("bootstrap.servers"))
.option("subscribe", topic)
.option("startingOffsets", "earliest")
.load()
val records = inputStream.map(row =>
DefSer.deserialize(row.getAs[Array[Byte]]("value")).asInstanceOf[TextRecord]
)
函数将inputStream Row实例直接映射到TextRecord(假设它是案例类)
records
Dataset[TextRecord]
将直接为{{1}}。
只要您导入SparkSession隐式对象,就不需要为case类提供编码器类,Scala会为您隐式地进行处理。