Question

我正在跟踪spark-redshift教程，以将redshift读入spark（数据块）。我有以下代码：

val tempDir = "s3n://{my-s3-bucket-here}"



val jdbcUsername = "usernameExample"
val jdbcPassword = "samplePassword"
val jdbcHostname = "redshift.companyname.xyz"
val jdbcPort = 9293
val jdbcDatabase = "database"
val jdbcUrl = "sampleURL"


sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "SAMPLEAWSKEY")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SECRETKEYHERE")

val subs_dim = sqlContext.read.format("com.databricks.spark.redshift").option("url", jdbcUrl).option("tempdir", tempDir).option("dbtable", "example.exampledb").load()

现在，当我尝试运行此命令时，我得到：

java.lang.IllegalArgumentException: requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README.

我有点困惑，因为我已经使用sc.hadoopConfiguration.set定义了awsAccesskeyID。我是公司的新手，所以我想知道AWS密钥是否错误，或者是否缺少其他东西？

谢谢！

Answer 1

我认为我看到的唯一原因是，由于您尚未设置forward_spark_s3_credentials，因此没有将S3凭据传递给Redshift连接。

在通话中添加以下选项。

option("forward_spark_s3_credentials", "true");

请参考下面的documentation代码段。

将Spark的S3凭据转发到Redshift：如果将forward_spark_s3_credentials选项设置为true，则此库将自动发现Spark用于连接到S3的凭据，并将这些凭据通过JDBC转发给Redshift。

希望它对您有帮助！

从Redshift读取到Spark Dataframe（Spark-Redshift模块）

1 个答案: