spark structured streaming(java):任务不可序列化

时间:2017-04-12 22:06:15

标签: java apache-spark spark-streaming

在下面的代码中我试图从kafka主题中读取avro消息,并且在map方法中,我使用KafkaAvroDecoder fromBytes方法,它似乎导致任务不可序列化异常,我如何解码avro消息?< / p>

public static void main(String [] args)抛出异常{

    Properties decoderProps = new Properties();
    decoderProps.put("schema.registry.url", SCHEMA_REG_URL);
    //decoderProps.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true");

    KafkaAvroDecoder decoder = new KafkaAvroDecoder(new VerifiableProperties(decoderProps));


    SparkSession spark = SparkSession
        .builder()
        .appName("JavaCount1").master("local[2]")
        .config("spark.driver.extraJavaOptions", "-Xss4M")
        .getOrCreate();

    Dataset<Row> ds1 = spark
        .readStream()
        .format("kafka")
        .option("kafka.bootstrap.servers", HOSTS)
        .option("subscribe", "systemDec200Message")
        .option("startingOffsets", "earliest")
        .option("maxOffsetsPerTrigger", 1)
        .load();



    Dataset<String> ds2 = ds1.map(m-> {
        GenericData.Record data = (GenericData.Record)decoder.fromBytes((byte[]) m.get(1));

        return "sddasdadasdsadas";
}, Encoders.STRING());





    StreamingQuery query = ds2.writeStream()
        .outputMode("append")
        .format("console")
        .trigger(ProcessingTime.apply(15))
        .start();

    query.awaitTermination();
}

我得到如下例外,

17/04/12 16:51:06 INFO CodeGenerator:代码生成于329.145119 ms 17/04/12 16:51:07错误StreamExecution:查询[id = 1d56386c-3fba-4978-8565-6b9c880d4fce,runId = b7bbb8d8-b52d-4c14-9dec bc9cb41f8d77-]终止,错误org.apache.spark.SparkException:在org.apache.spark:在org.apache.spark.util.ClosureCleaner $ .ensureSerializable(298 ClosureCleaner.scala)任务不可串行化.util.ClosureCleaner $ .org $ apache $ spark $ util $ ClosureCleaner $$ clean(ClosureCleaner.scala:288)atg.apache.spark.util.ClosureCleaner $ .clean(ClosureCleaner.scala:108)at org.apache。 spark.SparkContext.clean(SparkContext.scala:2094)atg.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsWithIndex $ 1.apply(RDD.scala:840)at org.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsWithIndex $ 1.适用(RDD.scala:839)在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala: 112)在org.apache.spark.rdd.RDD.withScope(RDD.scala:3 62)在org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:839)在org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:371)在org.apache.spark.sql .execution.SparkPlan $$ anonfun $执行$ 1.apply(SparkPlan.scala:114)org.apache.spark.sql.execution.SparkPlan $$ anonfun $执行$ 1.apply(SparkPlan.scala:114)at org.apache .spark.sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply(SparkPlan.scala:135)at org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)

1 个答案:

答案 0 :(得分:0)

在lambda范围内(在地图调用中)移动KAFKA AVRO DECODER声明后,序列化问题消失了,但现在在运行时出现了另一个异常,

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 116, Column 101: No applicable constructor/method found for actual parameters "long"; candidates are: "java.lang.Integer(int)", "java.lang.Integer(java.lang.String)"
    at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10174)
    at org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:7559)
    at org.codehaus.janino.UnitCompiler.invokeConstructor(UnitCompiler.java:6505)
    at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4126)
    at org.codehaus.janino.UnitCompiler.access$7600(UnitCompiler.java:185)
    at org.codehaus.janino.UnitCompiler$10.visitNewClassInstance(UnitCompiler.java:3275)
    at org.codehaus.janino.Java$NewClassInstance.accept(Java.java:4085)
    at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
    at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
    at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3571)