我正尝试直接从我的AWS Glue脚本连接到PosgreSQL RDS。我尝试使用生成的代码进行连接,并且可以正常工作。但是使用JDBC类型的连接无法正常工作。这是代码:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
import pyspark.sql.functions as F
from pyspark.sql.functions import *
## Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
df = spark \
.read \
.format('jdbc') \
.option('url', 'jdbc:postgresql://host/database_name') \
.option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
.option('user', 'username') \
.option('password', 'password') \
.load()
job.commit()
部分错误:
An error occurred while calling o74.load. : java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out. at com.amazon.redshift.client.PGClient.connect ....
其他信息:
在此先感谢您是否需要更多信息。
答案 0 :(得分:1)
我刚刚找到原因。这是因为我没有指定端口。我不记得以前放过港口了。之后一切正常。
df = spark \
.read \
.format('jdbc') \
.option('url', 'jdbc:postgresql://host:5432/database_name') \
.option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
.option('user', 'username') \
.option('password', 'password') \
.load()