了解文件在HDFS群集上的位置的正确方法是什么?

时间:2017-08-18 13:49:14

标签: java hadoop hdfs hadoop-partitioning

我需要利用datanode locality开发自己的工作执行者(不是家庭作业)。

我有一个Hadoop 2.7.1的集群(2个数据节点)。

enter image description here (见http://jugsi.blogspot.it/2017/08/configuring-hadoop-271-on-windows-w-ssh.html

我的代码:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    //conf.set("fs.default.name", "hdfs://localhost:9000");
    System.out.println(FileSystem.get(conf));

    check(conf, "hdfs://localhost:19000/LICENSE.txt");
    check(conf, "hdfs://localhost:19000/test.txt");
    check(conf, "hdfs://localhost:19000/DOESNEXIST.txt");

}

public static void check(Configuration conf, String path) throws Exception {
    try {
        URI uri = URI.create (path);
        System.out.println(path);
        FileSystem file = FileSystem.get(uri, conf);
        Path p = new Path(uri);
        System.out.println(file);

        BlockLocation[] locations = file.getFileBlockLocations(p, 0, 128*1024*1024*1024);
        for (BlockLocation blockLocation : locations) {
            System.out.println(blockLocation);
        }

        FSDataInputStream in = file.open(p);
        byte[] buffer = new byte[50];
        for (int i=0; i<10; i++) { //truncate file
            int rsz = in.read(buffer, 0, buffer.length);
            if (rsz < 0)
                break;
           System.out.print(new String(buffer, 0, rsz));
        }

        System.out.println("\n");
    } catch (Exception e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
} 

复制是1和

关于主人:

hadoop fs -put LICENSE.txt /
奴隶上的

hadoop fs -put test.txt /

它有效,

  

HDFS://本地主机:19000 / LICENSE.TXT   DFS [DFSClient [CLIENTNAME = DFSClient_NONMAPREDUCE_1132409281_1,   ugi = 212442540(auth:SIMPLE)]] 0,15429,MASTER

  

HDFS://本地主机:19000 / test.txt的   DFS [DFSClient [CLIENTNAME = DFSClient_NONMAPREDUCE_1132409281_1,   ugi = 212442540(auth:SIMPLE)]] 0,4,SLAVE

但在我看来似乎是一种解决方法。 Spark或Yarn如何询问文件位置以便将作业关闭(转入)到datanode?

0 个答案:

没有答案