从AWS Glue到RDS的JDBC连接的连接超时

时间:2020-10-22 23:22:55

标签: jdbc pyspark aws-glue

我正尝试直接从我的AWS Glue脚本连接到PosgreSQL RDS。我尝试使用生成的代码进行连接,并且可以正常工作。但是使用JDBC类型的连接无法正常工作。这是代码:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
import pyspark.sql.functions as F
from pyspark.sql.functions import *

## Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()

job.commit()

部分错误:

An error occurred while calling o74.load. : java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out. at com.amazon.redshift.client.PGClient.connect ....

其他信息:

在此先感谢您是否需要更多信息。

1 个答案:

答案 0 :(得分:1)

我刚刚找到原因。这是因为我没有指定端口。我不记得以前放过港口了。之后一切正常。

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host:5432/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()