distcp失败并显示错误“设备上没有剩余空间”

时间:2015-08-20 06:02:30

标签: hadoop amazon-s3 hdfs snapshot distcp

我正在将HDFS快照复制到S3存储桶,得到以下错误: 我正在执行的命令是: hadoop distcp /.snapshot/$SNAPSHOTNAME s3a:// $ ACCESSKEY:$ SECRETKEY @ $ BUCKET / $ SNAPSHOTNAME

    15/08/20 06:50:07 INFO mapreduce.Job:  map 38% reduce 0%
15/08/20 06:50:08 INFO mapreduce.Job:  map 39% reduce 0%
15/08/20 06:52:15 INFO mapreduce.Job:  map 41% reduce 0%
15/08/20 06:52:37 INFO mapreduce.Job: Task Id : attempt_1439998402428_0006_m_000004_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0 --> s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0 to s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
        ... 10 more
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.write(NativeS3FileSystem.java:293)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:255)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
        ... 11 more

15/08/20 06:53:13 INFO mapreduce.Job: Task Id : attempt_1439998402428_0006_m_000007_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c --> s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c to s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
        ... 10 more
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.write(NativeS3FileSystem.java:293)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:255)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
        ... 11 more

然而,4 TB的设备上有足够的空间,请帮助。

1 个答案:

答案 0 :(得分:2)

所以我遇到了同样的问题,这是最终对我有用的:

hadoop distcp -D mapreduce.job.maxtaskfailures.per.tracker = 1 ...

我尝试了一些事情(在同事的帮助下),但对我有用的主要事情是 - 将每个跟踪器的最大任务失败更改为1.这主要是关键。基本上个别节点的空间不足。因此,通过这样做,我强迫作业一旦失败就不再在节点上重试。

我试过但没有用的其他事情 1.增加地图制作者的数量。 (-m) 2.将重试次数从3次增加到12次。 (-D yarn.app.mapreduce.client.max-retries = 12)