通过Spark runner和HDFS的Apache Beam Word Count示例失败,并且“无法序列化和反序列化属性”

时间:2017-06-07 11:33:27

标签: apache-spark hdfs apache-beam

我正在尝试在Spark v1.6.x上运行Apache Beam v2.0.0字数计数示例(通过Yarn v2.7.3),以便它可以读取和写入HDFS(v2.7.3)。

目前,我通过以下命令提交作业:

bin/spark-submit --class org.apache.beam.examples.WordCount \
  --master yarn --deploy-mode cluster \
  test/word-count-beam-1.0-SNAPSHOT.jar \
    --inputFile=hdfs://test/input/* \
    --output=hdfs://test/output \
    --runner=SparkRunner --sparkMaster=yarn

不幸的是,作业失败并出现以下异常:

Failed to serialize and deserialize property 'hdfsConfiguration' with value '[Configuration: /usr/hdp/current/hadoop-client/conf/core-site.xml, /usr/hdp/current/hadoop-client/conf/hdfs-site.xml]'

这里是完整的堆栈跟踪:

java.lang.IllegalStateException: Failed to serialize the pipeline options.
  at org.apache.beam.runners.spark.translation.SparkRuntimeContext.serializePipelineOptions(SparkRuntimeContext.java:58)
  at org.apache.beam.runners.spark.translation.SparkRuntimeContext.<init>(SparkRuntimeContext.java:41)
  at org.apache.beam.runners.spark.translation.EvaluationContext.<init>(EvaluationContext.java:67)
  at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:196)
  at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:85)
  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
  at at.tmobile.bigdata.examples.WordCount.main(WordCount.java:184)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:561)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'hdfsConfiguration' with value '[Configuration: /usr/hdp/current/hadoop-client/conf/core-site.xml, /usr/hdp/current/hadoop-client/conf/hdfs-site.xml]'
  at com.fasterxml.jackson.databind.JsonMappingException.fromUnexpectedIOE(JsonMappingException.java:163)
  at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2342)
  at org.apache.beam.runners.spark.translation.SparkRuntimeContext.serializePipelineOptions(SparkRuntimeContext.java:56)
  ... 12 more
Caused by: java.io.IOException: Failed to serialize and deserialize property 'hdfsConfiguration' with value '[Configuration: /usr/hdp/current/hadoop-client/conf/core-site.xml, /usr/hdp/current/hadoop-client/conf/hdfs-site.xml]'
  at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:710)
  at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:629)
  at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:618)
  at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
  at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
  at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
  ... 13 more
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Conflicting property-based creators: already had [constructor for java.util.ArrayList, annotations: [null]], encountered [constructor for java.util.ArrayList, annotations: [null]]
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:266)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:241)
  at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)
  at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:394)
  at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3169)
  at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3062)
  at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2175)
  at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:708)
  ... 18 more
Caused by: java.lang.IllegalArgumentException: Conflicting property-based creators: already had [constructor for java.util.ArrayList, annotations: [null]], encountered [constructor for java.util.ArrayList, annotations: [null]]
  at com.fasterxml.jackson.databind.deser.impl.CreatorCollector.verifyNonDup(CreatorCollector.java:228)
  at com.fasterxml.jackson.databind.deser.impl.CreatorCollector.addPropertyCreator(CreatorCollector.java:168)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._handleSingleArgumentConstructor(BasicDeserializerFactory.java:487)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._addDeserializerConstructors(BasicDeserializerFactory.java:406)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._constructDefaultValueInstantiator(BasicDeserializerFactory.java:325)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.findValueInstantiator(BasicDeserializerFactory.java:266)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.createCollectionDeserializer(BasicDeserializerFactory.java:851)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:390)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:348)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:261)
  ... 25 more

有人知道如何解决这个问题吗?

1 个答案:

答案 0 :(得分:1)

我遇到了同样的问题。

java.util.ServiceLoader.load(com.fasterxml.jackson.databind.‌​Module.class)中加载的模块是:

问题在于dfsConfiguration类型的ArrayList<Configuration>属性。

paranamer个人资料的jackson-module-scala依赖项中排除spark runner依赖关系有助于:

 <profiles>
     <profile>
        <id>spark-runner</id>
        <dependencies>
            ...
            <dependency>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>jackson-module-scala_2.10</artifactId>
                <version>2.8.8</version>
                <scope>runtime</scope>
                <exclusions>
                    <exclusion>
                        <groupId>com.fasterxml.jackson.module</groupId>
                        <artifactId>jackson-module-paranamer</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            ...
        </dependencies>
    </profile>
</profiles>

ParanamerModule检查属性注释,但ArrayList构造函数失败,但它是可选的。