Hadoop远程复制

时间:2015-03-16 09:24:40

标签: java hadoop

我需要将一些文件从hdfs:/// user / hdfs / path1复制到hdfs:/// user / hdfs / path2。我写了一个java代码来完成这项工作: -

ugi = UserGroupInformation.createRemoteUser("hdfs", AuthMethod.SIMPLE);
System.out.println(ugi.getUserName());
conf = new org.apache.hadoop.conf.Configuration();
// TODO: Change IP
conf.set("fs.defaultFS", URL);
conf.set("hadoop.job.ugi", user);
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
// paths = new ArrayList<>();
  fs = FileSystem.get(conf);

我正在获取外卡的所有路径

fs.globStatus(new Path(regPath));

并复制为

FileUtil.copy(fs, p, fs, new Path(to + "/" + p.getName()), false, true, conf);

但是,复制失败并显示以下消息,而globstatus执行成功

WARN  BlockReaderFactory:682 - I/O error constructing remote block reader.
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.110.80.177:50010]
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3044)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:744)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:659)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:327)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:574)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:797)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)

请注意,我使用端口转发通过Internet远程运行代码。即。

192.168.1.10 [JAVA API] ---&gt; 154.23.56.116:8082 [Name Node Public I / P] ====== 10.1.3.4:8082 [Name Node private IP]

我想以下是原因: -

  1. 查询由名称节点
  2. 成功执行的globStatus的namenode
  3. 将复制命令传递给namenode,该命令将返回其他机器上的其他数据节点地址的10.110.80.177:50010,然后Java IP将尝试将复制命令传递给这些数据节点,因为它们不会导出到外部世界我收到了这个错误!
  4. 我在这个演绎中是对的吗?如何解决这个问题?我是否需要在namenode上创建一个java服务器,它将收集复制命令并在集群中本地复制文件。

0 个答案:

没有答案