如何将消息从套接字流源转换为自定义域对象?

时间:2017-12-08 00:45:57

标签: apache-spark apache-spark-sql spark-structured-streaming

我很惹人注目。我有一个与一个工作者一起运行的Spark Standalone 2.2。我正在使用套接字源并尝试将传入流读入名为MicroserviceMessage的对象。

val message = spark.readStream
  .format("socket")
  .option("host", host)
  .option("port", port)
  .load()

val df = message.as[MicroserviceMessage].flatMap(microserviceMessage =>
    microserviceMessage.DataPoints.map(datapoint => (datapoint, microserviceMessage.ServiceProperties, datapoint.EpochUTC)))
  .toDF("datapoint", "properties", "timestamp")

我希望这将是一个DataFrame,其列为" datapoint"," properties"和"时间戳"

我粘贴到我的netcat终端的数据看起来像这样(这是我试图以MicroserviceMessage读取的内容):

{
  "SystemType": "mytype",
  "SystemGuid": "6c84fb90-12c4-11e1-840d-7b25c5ee775a",
  "TagType": "Raw Tags",
  "ServiceType": "FILTER",
  "DataPoints": [
    {
      "TagName": "013FIC003.PV",
      "EpochUTC": 1505247956001,
      "ItemValue": 25.47177,
      "ItemValueStr": "NORMAL",
      "Quality": "Good",
      "TimeOffset": "P0000"
    },
    {
      "TagName": "013FIC003.PV",
      "EpochUTC": 1505247956010,
      "ItemValue": 26.47177,
      "ItemValueStr": "NORMAL",
      "Quality": "Good",
      "TimeOffset": "P0000"
    }
  ],
  "ServiceProperties": [
    {
      "Key": "OutputTagName",
      "Value": "FI12102.PV_CL"
    },
    {
      "Key": "OutputTagType",
      "Value": "Cleansing Flow Tags"
    }
  ]
}

相反,我看到的是:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`SystemType`' given input columns: [value];

MicroserviceMessage案例类如下所示:

case class DataPoints
(
  TagName: String,
  EpochUTC: Double,
  ItemValue: Double,
  ItemValueStr: String,
  Quality: String,
  TimeOffset: String
)

case class ServiceProperties
(
  Key: String,
  Value: String
)

case class MicroserviceMessage
(
  SystemType: String,
  SystemGuid: String,
  TagType: String,
  ServiceType: String,
  DataPoints: List[DataPoints],
  ServiceProperties: List[ServiceProperties]
)

编辑: 在阅读this post后,我能够通过

开始工作
val messageEncoder = Encoders.bean(classOf[MicroserviceMessage])

val df = message.select($"value").as(messageEncoder).map(
  msmg => (msmg.ServiceType, msmg.SystemGuid)
).toDF("service", "guid")

但是当我开始发送数据时,这会导致问题。

Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize

完整stacktrace

2 个答案:

答案 0 :(得分:2)

此:

message.as[MicroserviceMessage]

错误,如错误消息所述:

  

无法解析给定输入列的“SystemType”:[value];

来自SocketStream的数据只是字符串(或字符串和时间戳)。要使其可用于强类型Dataset,您必须解析它,例如使用org.apache.spark.sql.functions.from_json

答案 1 :(得分:0)

异常的原因

  

引起:java.lang.BootstrapMethodError:java.lang.NoClassDefFoundError:scala / runtime / LambdaDeserialize

是您使用Scala 2.12.4(或2.12流中的任何其他流)编译了Spark Structured Streaming应用程序,这在Spark 2.2中是不受支持的。

来自scala.runtime.LambdaDeserializer的scaladoc:

  

此类仅用于合成$ deserializeLambda $方法,Scala 2.12编译器将添加到托管lambdas的类中。

Spark 2.2支持Scala 2.11.12,其中2.11.8是最“祝福”的版本。