从spark scala中的redshift读取数据时出错

时间:2017-02-04 14:49:14

标签: amazon-web-services apache-spark amazon-s3 amazon-redshift

我正在尝试使用spark从redshift加载数据。我已经完成了所有必需的操作。下面是读取表格以查看它是否正常工作的代码。

 val sparkSession = SparkSession
                            .builder
                            .master("local")
                            .appName("CollabrativeFilter")
                            .config("spark.sql.warehouse.dir", "file:///c:/Temp/spark-warehouse")
                            .config("spark.sql.crossJoin.enabled", true)
                            .getOrCreate()

val awsAccessKeyId = "value"
val awsSecretAccessKey = "value"
val redshiftDBName = "value"
val redshiftUserId = "value"
val redshiftPassword = "value"
val redshifturl = "value"
val jdbcURL = s"jdbc:redshift://$redshifturl/$redshiftDBName?user=$redshiftUserId&password=$redshiftPassword"

val tempS3Dir = "s3n:accessid:secretkey@bucket/"


val eventsDF = sparkSession.read
.format("com.databricks.spark.redshift")
.option("url",jdbcURL )
.option("tempdir", tempS3Dir)
.option("dbtable", "table_name")
.option("forward_spark_s3_credentials","true")
.load()   

eventsDF.show()

当我运行它时,我收到以下错误

Exception in thread "main" java.lang.NoClassDefFoundError: com/amazonaws/services/s3/model/S3ObjectInputStream
at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:51)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at new_test$.main(new_test.scala:42)
at new_test.main(new_test.scala)
Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.s3.model.S3ObjectInputStream
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)

我用谷歌搜索,发现我可能不得不在我的maven pom.xml中添加一些依赖项。请帮我解决这个错误。

0 个答案:

没有答案