Question

我正在尝试使用 databricks 和 pyspark 连接到我的 redshift 表，但我发现文档很难理解 (https://docs.databricks.com/data/data-sources/aws/amazon-redshift.html)。这是我到目前为止从 java.lang.NullPointerException 抛出 .option("aws_iam_role", "arn:aws:iam::946575530956:role/MY_IAM_ROLE") \ 的内容：

df = spark.read \
  .format("com.databricks.spark.redshift") \
  .option("url", "jdbc:redshift:://redshift-cluster-1.ci9fbdm1ahgn.us-east-1.redshift.amazonaws.com") \ # I installed the driver from Amazon
  .option("dbtable", "suppliers") \
  .option("tempdir", "s3a://spark-redshift/temp_data/") \
  .option("password", "MY-PASSWORD") \
  .option("user", "MY-USERNAME") \
  .option("aws_iam_role", "arn:aws:iam::946575530956:role/MY-IAM-ROLE") \
  .load()

如果我去掉 aws_iam_role，那么我会收到这个错误：IllegalArgumentException: requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README.，我假设他们指的是这个：https://github.com/databricks/spark-redshift/blob/master/README.md#authenticating-to-s3-and-redshift。这仍然对我没有太大帮助。

我觉得我已经设置了所有信息和权限，但我可能没有正确引用选项或其他内容。

非常感谢任何帮助，谢谢！

尝试使用pyspark从数据块连接到redshift数据库时出现空指针异常？

0 个答案: