我想在s3上保存并加载机器学习模型。
我做到了:
val credentials = new ProfileCredentialsProvider()
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", credentials.getCredentials.getAWSAccessKeyId)
hadoopConf.set("fs.s3.awsSecretAccessKey", credentials.getCredentials.getAWSSecretKey)
TrainValidationSplitModel.load(s"s3://model_path")
当我在本地运行它时,它就起作用了。
但是,当我在集群中运行它时,出现以下错误:
Serialization trace:
fields (org.apache.spark.sql.types.StructType)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101)
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:366)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:307)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:312)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:324)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.sql.types.StructField[]
Note: To register this class use: kryo.register(org.apache.spark.sql.types.StructField[].class);
at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488)
at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97)
at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:76)
... 10 more
您可能会说:“容易,您只需使用kryo.register(SomeClass.class)注册org.apache.spark.sql.types.StructField类;”
但是,经过将近15堂课的注册。 Kryo要求我注册一个私有的类(访问限制在spark软件包中)。
我该如何解决这个问题?
答案 0 :(得分:1)
该错误与保存和加载模型无关。
它是由spark.kryo.registrationRequired
引起的,您在配置中将其设置为true
。如果是,it behaves as follows
是否需要向Kryo注册。如果设置为“ true”,则如果未注册的类被序列化,Kryo将引发异常。如果设置为false(默认值),Kryo将与每个对象一起写入未注册的类名称。编写类名称可能会导致大量的性能开销,因此启用此选项可以严格执行以下操作:用户没有从注册中省略类。
我个人建议将其仅用于诊断并在实际运行该应用程序时将其禁用。