Question

我正尝试直接从我的AWS Glue脚本连接到PosgreSQL RDS。我尝试使用生成的代码进行连接，并且可以正常工作。但是使用JDBC类型的连接无法正常工作。这是代码：

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
import pyspark.sql.functions as F
from pyspark.sql.functions import *

## Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()

job.commit()

部分错误：

An error occurred while calling o74.load. : java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out. at com.amazon.redshift.client.PGClient.connect ....

其他信息：

它以前确实有效，但我不知道从那以后发生了什么变化
我已经使用AWS Glue中的“测试连接”测试了连接，并且有效
我已为RDS配置了VPC安全组以向同一安全组打开入站/出站（这基于本指南：https://docs.aws.amazon.com/glue/latest/dg/setup-vpc-for-glue-access.html）

在此先感谢您是否需要更多信息。

Answer 1

我刚刚找到原因。这是因为我没有指定端口。我不记得以前放过港口了。之后一切正常。

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host:5432/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()

从AWS Glue到RDS的JDBC连接的连接超时

1 个答案: