我需要利用datanode locality开发自己的工作执行者(不是家庭作业)。
我有一个Hadoop 2.7.1的集群(2个数据节点)。
(见http://jugsi.blogspot.it/2017/08/configuring-hadoop-271-on-windows-w-ssh.html)
我的代码:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
//conf.set("fs.default.name", "hdfs://localhost:9000");
System.out.println(FileSystem.get(conf));
check(conf, "hdfs://localhost:19000/LICENSE.txt");
check(conf, "hdfs://localhost:19000/test.txt");
check(conf, "hdfs://localhost:19000/DOESNEXIST.txt");
}
public static void check(Configuration conf, String path) throws Exception {
try {
URI uri = URI.create (path);
System.out.println(path);
FileSystem file = FileSystem.get(uri, conf);
Path p = new Path(uri);
System.out.println(file);
BlockLocation[] locations = file.getFileBlockLocations(p, 0, 128*1024*1024*1024);
for (BlockLocation blockLocation : locations) {
System.out.println(blockLocation);
}
FSDataInputStream in = file.open(p);
byte[] buffer = new byte[50];
for (int i=0; i<10; i++) { //truncate file
int rsz = in.read(buffer, 0, buffer.length);
if (rsz < 0)
break;
System.out.print(new String(buffer, 0, rsz));
}
System.out.println("\n");
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
复制是1和
关于主人:
hadoop fs -put LICENSE.txt /
奴隶上的:
hadoop fs -put test.txt /
它有效,
HDFS://本地主机:19000 / LICENSE.TXT DFS [DFSClient [CLIENTNAME = DFSClient_NONMAPREDUCE_1132409281_1, ugi = 212442540(auth:SIMPLE)]] 0,15429,MASTER
和
HDFS://本地主机:19000 / test.txt的 DFS [DFSClient [CLIENTNAME = DFSClient_NONMAPREDUCE_1132409281_1, ugi = 212442540(auth:SIMPLE)]] 0,4,SLAVE
但在我看来似乎是一种解决方法。 Spark或Yarn如何询问文件位置以便将作业关闭(转入)到datanode? strong>