Question

我正在AWS上运行测试作业。我正在从S3存储桶读取CSV数据，在其上运行GLUE ETL作业，并将相同的数据存储在Amazon Redshift上。 GLUE作业只是从S3读取数据并将其存储在Redshift中，而没有任何修改。作业运行正常，我在Redshift中获得了预期的结果，但是它返回了一个我无法理解的错误。

这是错误日志：

18/11/14 09:17:31 WARN YarnClient: The GET request failed for the URL http://169.254.76.1:8088/ws/v1/cluster/apps/application_1542186720539_0001
com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.conn.HttpHostConnectException: Connect to 169.254.76.1:8088 [/169.254.76.1] failed: Connection refused (Connection refused)

这是警告，而不是错误，但我想了解是什么导致了警告。我试图搜索WARN中指示的IP，但是找不到带有所提及IP的机器。

Answer 1

我在我的AWS Glue Job中注意到了这些错误，因此我发现一些对AWS有用的东西：

This WARN message is not so special, and does not mean job failure or any errors directly. I guess there should be other cause.
I would recommend you to enable continuous logging, and check both driver/executor logs to see if there are any suspicious behavior.
If you enable job bookmark, please try disabling it and see how it goes without bookmark.

https://forums.aws.amazon.com/thread.jspa?messageID=927547

我从一开始就禁用了书签。我要检查的是我的Glue作业将数据写入S3，并且每个内存都有一个内存，所以我要做的就是重新分配数据。

MyDynamicFrame.coalesce(100).write.partitionBy("month").mode("overwrite").parquet("s3://"+bucket+"/"+path+"/out_data")

因此，如果您有一些写操作，我建议您检查一下如何写S3

AWS Glue作业运行正确，但返回连接拒绝错误

1 个答案: