AWS Glue JOB:命令失败,错误代码为1

时间:2018-06-25 09:31:40

标签: parquet aws-glue

我们有用于粘合作业的python脚本,并且每隔一小时触发一次运行,以将JSON S3转换为镶木地板文件,因此我们遇到了以下问题。.以下日志是从cloudwatch中获取的jobId :

CoarseGrainedExecutorBackend: Driver commanded a shutdown
18/06/25 08:54:03 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from ip-172-31-34-26.ec2.internal/172.31.34.26:36135 is closed
18/06/25 08:54:03 ERROR OneForOneBlockFetcher: Failed while starting block fetches
java.io.IOException: Connection from ip-172-31-34-26.ec2.internal/172.31.34.26:36135 closed
        at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:146)
        at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:108)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:278)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:182)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:748)
18/06/25 08:54:03 INFO CoarseGrainedExecutorBackend: Driver from 172.31.47.44:45951 disconnected during shutdown
18/06/25 08:54:03 INFO CoarseGrainedExecutorBackend: Driver from 172.31.47.44:45951 disconnected during shutdown
18/06/25 08:54:03 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
18/06/25 08:54:03 INFO MemoryStore: MemoryStore cleared
18/06/25 08:54:03 INFO BlockManager: BlockManager stopped
18/06/25 08:54:03 INFO ShutdownHookManager: Shutdown hook called

3 个答案:

答案 0 :(得分:1)

打开胶水>作业>编辑您的作业>脚本库和作业参数(可选)>底部附近的作业参数 设置以下内容:键:--conf值:spark.yarn.executor.memoryOverhead = 1024 spark.driver.memory = 10g

答案 1 :(得分:0)

无法解决此问题,AWS Glue具有许多要做的增强功能。 到目前为止,我们将文件夹拆分为多个子文件夹,并将胶粘作业拆分为两个以处理这种情况,并且当我们提供自己的脚本选项时,也没有考虑内存开销。

答案 2 :(得分:0)

您需要通过将数据累积到一个大文件中来减少存储到S3存储桶中的文件数量,胶水对于较大的文件是有效的