当我尝试将火花数据帧写入postgres数据库时出现此错误。我正在使用本地群集,代码如下:
from pyspark import SparkContext
from pyspark import SQLContext, SparkConf
import os
os.environ["SPARK_CLASSPATH"] = '/usr/share/java/postgresql-jdbc4.jar'
conf = SparkConf() \
.setMaster('local[2]') \
.setAppName("test")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
df = sc.parallelize([("a", "b", "c", "d")]).toDF()
url_connect = "jdbc:postgresql://localhost:5432"
table = "table_test"
mode = "overwrite"
properties = {"user":"postgres", "password":"12345678"}
df.write.option('driver', 'org.postgresql.Driver').jdbc(
url_connect, table, mode, properties)
错误日志如下:
Py4JJavaError: An error occurred while calling o119.jdbc.
: java.lang.NullPointerException
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:308)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
我试过从网上搜索答案但找不到任何答案。提前谢谢!
答案 0 :(得分:1)
您是否尝试在table_test
变量中指定数据库?我有类似的实现,如下所示:
mysqlUrl = "jdbc:mysql://mysql:3306"
properties = {'user':'root',
'password':'password',
'driver':'com.mysql.cj.jdbc.Driver'
}
table = 'db_name.table_name'
try:
schemaDF = spark.read.jdbc(mysqlUrl, table, properties=properties)
print 'schema DF loaded'
except Exception, e:
print 'schema DF does not exist!'
答案 1 :(得分:0)
使用MySQL,我也遇到同样的问题。
解决问题的方法是找到合适的罐子。