全部。
我正在使用带有kafka2.2的spark2.4.0(都与scala 2.11一起使用)处理json流数据。我在这里关注了一些链接:
我在kafka中的数据格式(随机数据)是json:
{"yaw": 0, "height": 2053.800174349187, "timestamp": "1555465965", "v": 1, "longitude": "121.645261", "jet_number": 15, "acc": 1, "latitude": "30.050000"}
{"yaw": 0, "height": 2023.4573189529592, "timestamp": "1555465966", "v": 1, "longitude": "87.656227", "jet_number": 11, "acc": 1, "latitude": "30.050000"}
{"yaw": 0, "height": 2005.5774022979028, "timestamp": "1555465967", "v": 1, "longitude": "124.613970", "jet_number": 3, "acc": 1, "latitude": "30.050000"}
{"yaw": 0, "height": 2074.936351669867, "timestamp": "1555465968", "v": 1, "longitude": "131.765794", "jet_number": 15, "acc": 1, "latitude": "30.050000"}
{"yaw": 0, "height": 2030.5305980070775, "timestamp": "1555465969", "v": 1, "longitude": "126.936592", "jet_number": 12, "acc": 1, "latitude": "30.050000"}
{"yaw": 0, "height": 2024.540075254924, "timestamp": "1555465970", "v": 1, "longitude": "121.432735", "jet_number": 12, "acc": 1, "latitude": "30.050000"}
我的代码段:
import org.apache.spark.sql.types.{DataTypes,StructType}
val schema = new StructType()
.add("acc",DataTypes.IntegerType)
.add("v",DataTypes.IntegerType)
.add("longitude",DataTypes.StringType)
.add("jet_number",DataTypes.IntegerType)
.add("timestamp",DataTypes.StringType)
.add("latitude",DataTypes.StringType)
.add("height",DataTypes.IntegerType)
.add("yaw",DataTypes.IntegerType)
val df = spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "ip:9092")
.option("kafka.partition.assignment.strategy","org.apache.kafka.clients.consumer.RangeAssignor")
.option("subscribe", "test")
.load()
df.printSchema
val jetDF = df.selectExpr("CAST(value AS STRING)")
jetDF.printSchema
val jdf = jetDF.select(from_json($"value", schema).as("data")).select("data.*")
jdf.printSchema
jdf.writeStream
.outputMode("append")
.format("console")
.start()
.awaitTermination()
运行此代码后,我无法在此处获得正确的输出:
Spark context available as 'sc' (master = local[*], app id = local-1558671649422).
Spark session available as 'spark'.
root
|-- key: binary (nullable = true)
|-- value: binary (nullable = true)
|-- topic: string (nullable = true)
|-- partition: integer (nullable = true)
|-- offset: long (nullable = true)
|-- timestamp: timestamp (nullable = true)
|-- timestampType: integer (nullable = true)
root
|-- value: string (nullable = true)
root
|-- acc: integer (nullable = true)
|-- v: integer (nullable = true)
|-- longitude: string (nullable = true)
|-- jet_number: integer (nullable = true)
|-- timestamp: string (nullable = true)
|-- latitude: string (nullable = true)
|-- height: integer (nullable = true)
|-- yaw: integer (nullable = true)
-------------------------------------------
Batch: 0
-------------------------------------------
+---+---+---------+----------+---------+--------+------+---+
|acc| v|longitude|jet_number|timestamp|latitude|height|yaw|
+---+---+---------+----------+---------+--------+------+---+
+---+---+---------+----------+---------+--------+------+---+
-------------------------------------------
Batch: 1
-------------------------------------------
+----+----+---------+----------+---------+--------+------+----+
| acc| v|longitude|jet_number|timestamp|latitude|height| yaw|
+----+----+---------+----------+---------+--------+------+----+
|null|null| null| null| null| null| null|null|
|null|null| null| null| null| null| null|null|
|null|null| null| null| null| null| null|null|
|null|null| null| null| null| null| null|null|
有人可以帮我吗?我挣扎了两天。
:(