我将pyspark配置为直接与PostgreSQL一起使用。但是,我想使用jdbc连接器将数据从spark传递到presto,然后使用pyspark和presto在postgresql上运行查询。我该如何在代码上做到这一点?
from pyspark.sql import SparkSession
from pyspark import SparkContext,SparkConf
from pyspark.sql import SQLContext
import sys
sys.path.append('/usr/local/lib/python3.6/dist-packages')
import requests
import json, ast
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
spark = SparkSession.builder \
.master("local") \
.appName("jdbc data sources") \
.config("spark.sql.shuffle.partitions", "4") \
.getOrCreate()
driver = "io.prestosql.jdbc.PrestoDriver"
#path = "//host:port/prestosql/?user=<username>&password=<passwd>"
path = "//host:port/prestosql<catalog>"
url = "jdbc:presto:" + path
tablename = <tablename>
dbDataFrame = spark.read.format("jdbc").option("url", url).option("dbtable", "<select query>").option("driver", driver).load()
我在做什么错?我想通过presto在postgresql上运行一个选择查询,然后使用pyspark将结果传递回spark。
我遇到以下错误:
in get_return_value py4j.protocol.Py4JJavaError: An error occurred while
calling o53.load. : java.sql.SQLException: Authentication using
username/password requires SSL to be enabled at
io.prestosql.jdbc.PrestoDriverUri.setupClient(PrestoDriverUri.java:160) at
io.prestosql.jdbc.PrestoDriver.connect(PrestoDriver.java:91) at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
当我启用.option(“ SSL”,“ true”)时,出现新错误:
py4j.protocol.Py4JJavaError: An error occurred while calling o84.load. :
java.sql.SQLException: Error executing query at
io.prestosql.jdbc.PrestoStatement.internalExecute(PrestoStatement.jav a:284)
at io.prestosql.jdbc.PrestoStatement.execute(PrestoStatement.java:229) at
io.prestosql.jdbc.PrestoPreparedStatement.<init>(PrestoPreparedStatem
ent.java:80
我在做什么错..请帮助
答案 0 :(得分:1)
我猜您的 sql 查询中可能有错误。首选语法类似于 .option("dbtable","(select * from sample_table)a").load()