我正在尝试从Spark连接到Redshift(在Databricks上运行)
from pyspark.sql import SQLContext
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ACCESS_KEY)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", SECRET_KEY)
# IP addresses from Redshift Security Group panel
IP_ADDRESSES_TO_ADD = ["1.2.3.4/32", "5.6.7.8/32"]
PORTS_TO_ADD = ["80", "443"]
PROTOCOLS_TO_ADD = ["tcp"]
# Read data from a query
df = sqlContext.read \
.format("com.databricks.spark.redshift") \
.option("url", "jdbc:redshift://XXX.XXX.eu-west-1.redshift.amazonaws.com:5439/REDSHIFT_DB?user=REDSHIFT_USER&password=REDSHIFT_PW&ssl=true&sslfactory=com.amazon.redshift.ssl.NonValidatingFactory") \
.option("query", "select * FROM REDSHIFT_TABLE LIMIT 10") \
.option("tempdir", "s3n://path/to/temp/") \
.load()
但是我收到以下错误:
java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out.
我错过了什么吗?
答案 0 :(得分:1)
看起来像连接错误。请验证您是否是授权用户。
验证这一点: 运行以下命令:
telnet XXX.XXX.eu-west-1.redshift.amazonaws.com 5439
你应该得到这样的东西(如果你是授权用户):
Trying <IP address>...
Connected to <Host name>.
Escape character is '^]'.
但是如果你得到:connection time out
,则表示你不是授权用户。
答案 1 :(得分:0)
您如何旋转数据块集群节点? 是否按需? 每次群集终止时,下次启动群集时都会获得一组新的IP地址(EC2实例)。 因此,您需要确保将新分配的IP地址列入白名单以访问redshift(入站规则)