尝试使用pyspark从数据块连接到redshift数据库时出现空指针异常?

时间:2021-07-06 08:10:32

标签: amazon-web-services pyspark amazon-redshift databricks

我正在尝试使用 databricks 和 pyspark 连接到我的 redshift 表,但我发现文档很难理解 (https://docs.databricks.com/data/data-sources/aws/amazon-redshift.html)。这是我到目前为止从 java.lang.NullPointerException 抛出 .option("aws_iam_role", "arn:aws:iam::946575530956:role/MY_IAM_ROLE") \ 的内容:

df = spark.read \
  .format("com.databricks.spark.redshift") \
  .option("url", "jdbc:redshift:://redshift-cluster-1.ci9fbdm1ahgn.us-east-1.redshift.amazonaws.com") \ # I installed the driver from Amazon
  .option("dbtable", "suppliers") \
  .option("tempdir", "s3a://spark-redshift/temp_data/") \
  .option("password", "MY-PASSWORD") \
  .option("user", "MY-USERNAME") \
  .option("aws_iam_role", "arn:aws:iam::946575530956:role/MY-IAM-ROLE") \
  .load()

如果我去掉 aws_iam_role,那么我会收到这个错误:IllegalArgumentException: requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README.,我假设他们指的是这个:https://github.com/databricks/spark-redshift/blob/master/README.md#authenticating-to-s3-and-redshift。这仍然对我没有太大帮助。

我觉得我已经设置了所有信息和权限,但我可能没有正确引用选项或其他内容。

非常感谢任何帮助,谢谢!

0 个答案:

没有答案
相关问题