我正在尝试将数据帧写入Postgres数据库,并且正在关注问题here。我已经像该用户一样确认可以直接从命令行运行代码,但是我需要通过Jupyter Notebook运行它。这是我所拥有的:
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.driver.extraClassPath', '/home/hadoop/postgresql-42.2.5.jar').getOrCreate()
df = spark.read.format("parquet").option("inferSchema","false").option("header","true").load(input).select(fields)
mode = "append"
url = "jdbc:postgresql://host/db"
properties = {"user": "user","password": "password","driver": "org.postgresql.Driver"}
df.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
但我不断收到错误消息:
An error occurred while calling o188.jdbc.
: java.lang.ClassNotFoundException: org.postgresql.Driver
拜托,有人指出我在做错什么,我才刚刚结束。
编辑:
我已将代码更新为此:
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.driver.extraClassPath', '/home/hadoop/postgresql-42.2.5.jar').getOrCreate()
mode = "append"
url = "jdbc:postgresql://host/db"
properties = {"user": "user","password": "password"}
df.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
并出现新错误:
An error occurred while calling o236.jdbc.
: java.sql.SQLException: No suitable driver
编辑2:
链接的帖子适用于未使用Jupyter笔记本的dockerized MySQL配置