AWS Glue无法像示例https://www.snowflake.com/blog/how-to-use-aws-glue-with-snowflake/中那样运行雪花python连接

时间:2019-10-15 19:38:50

标签: snowflake-data-warehouse

(提交以下线程以帮助其他Snowflake用户了解适用于AWS Glue的方法)


我正在尝试在我的aws胶水作业中实现雪花连接,如示例上所述:

https://www.snowflake.com/blog/how-to-use-aws-glue-with-snowflake/

我正在使用最新版本

  • spark-snowflake_2.12-2.5.2-spark_2.4
  • snowflake-jdbc-3.9.1
  • 胶水版本-Spark 2.4,Python 3(胶水版本1.0)

但是出现以下错误:

py4j.protocol.Py4JJavaError: An error occurred while calling o75.load.
: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at net.snowflake.spark.snowflake.Parameters$MergedParameters.<init>(Parameters.scala:208)
at net.snowflake.spark.snowflake.Parameters$.mergeParameters(Parameters.scala:202)
at net.snowflake.spark.snowflake.DefaultSource.createRelation(DefaultSource.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

1 个答案:

答案 0 :(得分:0)

答案1: 也许我有点普通,但是我倾向于假设这样的错误是由于版本不兼容所致。您的SCALA版本是什么?以下是一些有助于进行设置的常规信息(可能需要点击“展开信息”以查看全部信息):

我相信AWS Glue现在支持Spark 2.4.3,但是它仍然很新,也许没有经过全面测试。您可以随意尝试,也可以尝试以下我们还知道AWS Glue支持的Spark 2.2.1。

我们知道AWS Glue也支持Spark 2.2.1,因此我将在这里详细说明。第一个链接是有关此版本的AWS Glue支持的文档,第二个链接是您可以下载此版本的spark-snowflake连接器的地方。

对于Spark 2.2.1,我相信您需要SCALA 2.11,您可以在此处找到它:

您可以使用从此处下载的最新的Snowball JDBC驱动程序:

希望这会有所帮助。让我们知道您是否仍然遇到相同的错误。

响应2: 我可以根据您的建议将表格从雪花数据库连接并加载到数据框。

我正在使用版本

1)Spark 2.2,Python2,Glue版本(0.9)

2)spark-snowflake_2.11-2.2.1.jar

3)雪花-jdbc-3.2.7

但是当我执行df.printschema()时,一切都很好,但是当我执行df.show()时,出现以下错误:

我也无法将此数据帧保存到s3存储桶中。

py4j.protocol.Py4JJavaError: An error occurred while calling o75.showString.
: java.lang.NoClassDefFoundError: net/snowflake/client/jdbc/internal/snowflake/common/core/S3FileEncryptionMaterial
at net.snowflake.spark.snowflake.ConnectorSFStageManager.encMat$lzycompute(ConnectorSFStageManager.scala:203)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.encMat(ConnectorSFStageManager.scala:201)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.masterKey$lzycompute(ConnectorSFStageManager.scala:231)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.masterKey(ConnectorSFStageManager.scala:230)
at net.snowflake.spark.snowflake.SnowflakeRDD.<init>(SnowflakeRDD.scala:60)
at net.snowflake.spark.snowflake.SnowflakeRelation.getRDDFromS3(SnowflakeRelation.scala:189)
at net.snowflake.spark.snowflake.SnowflakeRelation.buildScanFromSQL(SnowflakeRelation.scala:103)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.toRDD(QueryBuilder.scala:81)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.rdd$lzycompute(QueryBuilder.scala:28)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.rdd(QueryBuilder.scala:28)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$$anonfun$getRDDFromPlan$1.apply(QueryBuilder.scala:183)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$$anonfun$getRDDFromPlan$1.apply(QueryBuilder.scala:182)
at scala.Option.map(Option.scala:146)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$.getRDDFromPlan(QueryBuilder.scala:182)
at net.snowflake.spark.snowflake.pushdowns.SnowflakeStrategy.buildQueryRDD(SnowflakeStrategy.scala:35)
at net.snowflake.spark.snowflake.pushdowns.SnowflakeStrategy.apply(SnowflakeStrategy.scala:20)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2837)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2150)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2363)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:241)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: net.snowflake.client.jdbc.internal.snowflake.common.core.S3FileEncryptionMaterial
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

答案2 您可以使用更高版本的JDBC驱动程序重试吗?

响应#3 谢谢您的调查。我可以使用以下版本组合来解决此问题:

Spark 2.4, Python 3 (Glue version 1.0)
spark-snowflake_2.11-2.4.8.jar
snowflake-jdbc-3.6.12.jar

答案3 感谢您也发布了这些版本,以便其他用户知道适用于AWS Glue的版本。

相关问题