在下面的代码中我试图从kafka主题中读取avro消息,并且在map方法中,我使用KafkaAvroDecoder fromBytes方法,它似乎导致任务不可序列化异常,我如何解码avro消息?< / p>
public static void main(String [] args)抛出异常{
Properties decoderProps = new Properties();
decoderProps.put("schema.registry.url", SCHEMA_REG_URL);
//decoderProps.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true");
KafkaAvroDecoder decoder = new KafkaAvroDecoder(new VerifiableProperties(decoderProps));
SparkSession spark = SparkSession
.builder()
.appName("JavaCount1").master("local[2]")
.config("spark.driver.extraJavaOptions", "-Xss4M")
.getOrCreate();
Dataset<Row> ds1 = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", HOSTS)
.option("subscribe", "systemDec200Message")
.option("startingOffsets", "earliest")
.option("maxOffsetsPerTrigger", 1)
.load();
Dataset<String> ds2 = ds1.map(m-> {
GenericData.Record data = (GenericData.Record)decoder.fromBytes((byte[]) m.get(1));
return "sddasdadasdsadas";
}, Encoders.STRING());
StreamingQuery query = ds2.writeStream()
.outputMode("append")
.format("console")
.trigger(ProcessingTime.apply(15))
.start();
query.awaitTermination();
}
我得到如下例外,
17/04/12 16:51:06 INFO CodeGenerator:代码生成于329.145119 ms 17/04/12 16:51:07错误StreamExecution:查询[id = 1d56386c-3fba-4978-8565-6b9c880d4fce,runId = b7bbb8d8-b52d-4c14-9dec bc9cb41f8d77-]终止,错误org.apache.spark.SparkException:在org.apache.spark:在org.apache.spark.util.ClosureCleaner $ .ensureSerializable(298 ClosureCleaner.scala)任务不可串行化.util.ClosureCleaner $ .org $ apache $ spark $ util $ ClosureCleaner $$ clean(ClosureCleaner.scala:288)atg.apache.spark.util.ClosureCleaner $ .clean(ClosureCleaner.scala:108)at org.apache。 spark.SparkContext.clean(SparkContext.scala:2094)atg.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsWithIndex $ 1.apply(RDD.scala:840)at org.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsWithIndex $ 1.适用(RDD.scala:839)在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala: 112)在org.apache.spark.rdd.RDD.withScope(RDD.scala:3 62)在org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:839)在org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:371)在org.apache.spark.sql .execution.SparkPlan $$ anonfun $执行$ 1.apply(SparkPlan.scala:114)org.apache.spark.sql.execution.SparkPlan $$ anonfun $执行$ 1.apply(SparkPlan.scala:114)at org.apache .spark.sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply(SparkPlan.scala:135)at org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)
答案 0 :(得分:0)
在lambda范围内(在地图调用中)移动KAFKA AVRO DECODER声明后,序列化问题消失了,但现在在运行时出现了另一个异常,
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 116, Column 101: No applicable constructor/method found for actual parameters "long"; candidates are: "java.lang.Integer(int)", "java.lang.Integer(java.lang.String)"
at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10174)
at org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:7559)
at org.codehaus.janino.UnitCompiler.invokeConstructor(UnitCompiler.java:6505)
at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4126)
at org.codehaus.janino.UnitCompiler.access$7600(UnitCompiler.java:185)
at org.codehaus.janino.UnitCompiler$10.visitNewClassInstance(UnitCompiler.java:3275)
at org.codehaus.janino.Java$NewClassInstance.accept(Java.java:4085)
at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3571)