我很惹人注目。我有一个与一个工作者一起运行的Spark Standalone 2.2。我正在使用套接字源并尝试将传入流读入名为MicroserviceMessage的对象。
val message = spark.readStream
.format("socket")
.option("host", host)
.option("port", port)
.load()
val df = message.as[MicroserviceMessage].flatMap(microserviceMessage =>
microserviceMessage.DataPoints.map(datapoint => (datapoint, microserviceMessage.ServiceProperties, datapoint.EpochUTC)))
.toDF("datapoint", "properties", "timestamp")
我希望这将是一个DataFrame,其列为" datapoint"," properties"和"时间戳"
我粘贴到我的netcat终端的数据看起来像这样(这是我试图以MicroserviceMessage读取的内容):
{
"SystemType": "mytype",
"SystemGuid": "6c84fb90-12c4-11e1-840d-7b25c5ee775a",
"TagType": "Raw Tags",
"ServiceType": "FILTER",
"DataPoints": [
{
"TagName": "013FIC003.PV",
"EpochUTC": 1505247956001,
"ItemValue": 25.47177,
"ItemValueStr": "NORMAL",
"Quality": "Good",
"TimeOffset": "P0000"
},
{
"TagName": "013FIC003.PV",
"EpochUTC": 1505247956010,
"ItemValue": 26.47177,
"ItemValueStr": "NORMAL",
"Quality": "Good",
"TimeOffset": "P0000"
}
],
"ServiceProperties": [
{
"Key": "OutputTagName",
"Value": "FI12102.PV_CL"
},
{
"Key": "OutputTagType",
"Value": "Cleansing Flow Tags"
}
]
}
相反,我看到的是:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`SystemType`' given input columns: [value];
MicroserviceMessage案例类如下所示:
case class DataPoints
(
TagName: String,
EpochUTC: Double,
ItemValue: Double,
ItemValueStr: String,
Quality: String,
TimeOffset: String
)
case class ServiceProperties
(
Key: String,
Value: String
)
case class MicroserviceMessage
(
SystemType: String,
SystemGuid: String,
TagType: String,
ServiceType: String,
DataPoints: List[DataPoints],
ServiceProperties: List[ServiceProperties]
)
编辑: 在阅读this post后,我能够通过
开始工作val messageEncoder = Encoders.bean(classOf[MicroserviceMessage])
val df = message.select($"value").as(messageEncoder).map(
msmg => (msmg.ServiceType, msmg.SystemGuid)
).toDF("service", "guid")
但是当我开始发送数据时,这会导致问题。
Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
答案 0 :(得分:2)
此:
message.as[MicroserviceMessage]
错误,如错误消息所述:
无法解析给定输入列的“
SystemType
”:[value];
来自SocketStream
的数据只是字符串(或字符串和时间戳)。要使其可用于强类型Dataset
,您必须解析它,例如使用org.apache.spark.sql.functions.from_json
。
答案 1 :(得分:0)
异常的原因
引起:java.lang.BootstrapMethodError:java.lang.NoClassDefFoundError:scala / runtime / LambdaDeserialize
是您使用Scala 2.12.4(或2.12流中的任何其他流)编译了Spark Structured Streaming应用程序,这在Spark 2.2中是不受支持的。
来自scala.runtime.LambdaDeserializer的scaladoc:
此类仅用于合成$ deserializeLambda $方法,Scala 2.12编译器将添加到托管lambdas的类中。
Spark 2.2支持Scala 2.11.12,其中2.11.8是最“祝福”的版本。