如何将pyspark数据帧写入redshift数据库

时间:2021-02-24 07:59:15

标签: python apache-spark pyspark amazon-redshift

我是 Redshift 的新手,所以我需要一些帮助。
我正在尝试使用 jdbc 库(不是 databricks 一个,因为该库对 scala 2.12 无效)将 pyspark 数据帧写入数据库,但出现权限错误。 代码:

df.write.format('jdbc').options(
    url='jdbc:redshift://server:5439/db',
    driver='com.amazon.redshift.jdbc42.Driver',
    dbtable=new_table,
    user='user',
    password='pass').mode('append').save()

错误:

21/02/24 08:42:42 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "redshift_spark.py", line 77, in <module>
   .mode('append').save()
  File "\venv\lib\site-packages\pyspark\sql\readwriter.py", line 825, in save
    self._jwrite.save()
  File "C:\apps\spark-3.0.1-bin-hadoop3.2\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py", line 1305, in __call__
  File "\venv\lib\site-packages\pyspark\sql\utils.py", line 128, in deco
    return f(*a, **kw)
  File "C:\apps\spark-3.0.1-bin-hadoop3.2\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o51.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, ip-192-168-1-132.eu-west-1.compute.internal, executor driver): java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only;
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)

好像我没有阅读权限,但是我在哪里需要这个权限?我尝试使用 postgres 库 psycopg2 和 postgres jdbc 驱动程序 org.postgresql.Driver 访问数据库,但我没有问题。

0 个答案:

没有答案