Question

我需要利用datanode locality开发自己的工作执行者（不是家庭作业）。

我有一个Hadoop 2.7.1的集群（2个数据节点）。

（见http://jugsi.blogspot.it/2017/08/configuring-hadoop-271-on-windows-w-ssh.html）

我的代码：

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    //conf.set("fs.default.name", "hdfs://localhost:9000");
    System.out.println(FileSystem.get(conf));

    check(conf, "hdfs://localhost:19000/LICENSE.txt");
    check(conf, "hdfs://localhost:19000/test.txt");
    check(conf, "hdfs://localhost:19000/DOESNEXIST.txt");

}

public static void check(Configuration conf, String path) throws Exception {
    try {
        URI uri = URI.create (path);
        System.out.println(path);
        FileSystem file = FileSystem.get(uri, conf);
        Path p = new Path(uri);
        System.out.println(file);

        BlockLocation[] locations = file.getFileBlockLocations(p, 0, 128*1024*1024*1024);
        for (BlockLocation blockLocation : locations) {
            System.out.println(blockLocation);
        }

        FSDataInputStream in = file.open(p);
        byte[] buffer = new byte[50];
        for (int i=0; i<10; i++) { //truncate file
            int rsz = in.read(buffer, 0, buffer.length);
            if (rsz < 0)
                break;
           System.out.print(new String(buffer, 0, rsz));
        }

        System.out.println("\n");
    } catch (Exception e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

复制是1和

关于主人：

hadoop fs -put LICENSE.txt /

奴隶上的

：

hadoop fs -put test.txt /

它有效，

HDFS：//本地主机：19000 / LICENSE.TXT DFS [DFSClient [CLIENTNAME = DFSClient_NONMAPREDUCE_1132409281_1， ugi = 212442540（auth：SIMPLE）]] 0,15429，MASTER

和

HDFS：//本地主机：19000 / test.txt的 DFS [DFSClient [CLIENTNAME = DFSClient_NONMAPREDUCE_1132409281_1， ugi = 212442540（auth：SIMPLE）]] 0,4，SLAVE

但在我看来似乎是一种解决方法。 Spark或Yarn如何询问文件位置以便将作业关闭（转入）到datanode？

了解文件在HDFS群集上的位置的正确方法是什么？

0 个答案: