如何在Spark 2.4.4中使用三角洲湖泊

时间:2020-09-30 06:22:16

标签: apache-spark delta-lake

我使用的是Spark 2.4.4,当我输入pyspark shell时,我指定了delta lake和jackson软件包,如下所示:

    pyspark --packages io.delta:delta-core_2.11:0.6.1,com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7.1 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

但随后出现以下错误:

data.write.format(“ delta”)。save(“ / tmp / delta-table”) 追溯(最近一次通话): 文件“”,第1行,位于 文件“ /usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py”, 738行,保存中 自我。 jwrite.save(路径) 文件“ /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, 第1257行,在致电中 文件“ /usr/hdp/current/spark2-client/python/pyspark/sql/utils.py”,第63行, 在装饰 返回f(* a,** kw) 文件“ /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”, 第328行,位于get_return_value中 py4j.protocol.Py4JJavaError:调用o91.save时发生错误。 :java.lang.NoSuchMethodError:com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper.com $ fasterxml $ jackson $ module $ scala $ experimental $ ScalaObjectMapper $ setter $ com $ fasterxml $ jackson $ module $ scala $ experimental $ ScalaObjectMapper $$ MAP $ eq(Ljava / lang / Class;)V 在com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper $ class。$ init $(ScalaObjectMapper.scala:331) 在org.apache.spark.sql.delta.util.JsonUtils $$ anon $ 1。(JsonUtils.scala:27) 在org.apache.spark.sql.delta.util.JsonUtils $。(JsonUtils.scala:27) 在org.apache.spark.sql.delta.util.JsonUtils $。(JsonUtils.scala) 在org.apache.spark.sql.delta.DeltaOperations $ Write $$ anonfun $ 1.apply(DeltaOperations.scala:58) 在org.apache.spark.sql.delta.DeltaOperations $ Write $$ anonfun $ 1.apply(DeltaOperations.scala:58) 在scala.Option.map(Option.scala:146) 在org.apache.spark.sql.delta.DeltaOperations $ Write。(DeltaOperations.scala:58) 在org.apache.spark.sql.delta.commands.WriteIntoDelta $$ anonfun $ run $ 1.apply(WriteIntoDelta.scala:66) 在org.apache.spark.sql.delta.commands.WriteIntoDelta $$ anonfun $ run $ 1.apply(WriteIntoDelta.scala:64) 在org.apache.spark.sql.delta.DeltaLog.withNewTransaction(DeltaLog.scala:188) 在org.apache.spark.sql.delta.commands.WriteIntoDelta.run(WriteIntoDelta.scala:64) 在org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:134) 在org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult $ lzycompute(commands.scala:70) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:131) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:127) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply(SparkPlan.scala:155) 在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151) 在org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 在org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) 在org.apache.spark.sql.execution.QueryExecution.toRdd $ lzycompute(QueryExecution.scala:80) 在org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 在org.apache.spark.sql.DataFrameWriter $$ anonfun $ runCommand $ 1.apply(DataFrameWriter.scala:676) 在org.apache.spark.sql.DataFrameWriter $$ anonfun $ runCommand $ 1.apply(DataFrameWriter.scala:676) 在org.apache.spark.sql.execution.SQLExecution $$ anonfun $ withNewExecutionId $ 1.apply(SQLExecution.scala:78) 在org.apache.spark.sql.execution.SQLExecution $ .withSQLConfPropagated(SQLExecution.scala:125) 在org.apache.spark.sql.execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala:73) 在org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) 在org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) 在org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) 在org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) 在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处 在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:498) 在py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 在py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 在py4j.Gateway.invoke(Gateway.java:282) 在py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 在py4j.commands.CallCommand.execute(CallCommand.java:79) 在py4j.GatewayConnection.run(GatewayConnection.java:238) 在java.lang.Thread.run(Thread.java:748)

0 个答案:

没有答案