我正在使用Hadoop 2.6,我有一个虚拟机集群,我安装了我的HDFS。我试图通过我本地运行的一些Java代码远程读取我的HDFS中的文件,基本方式是BufferedReader
FileSystem fs = null;
String hadoopLocalPath = "/path/to/my/hadoop/local/folder/etc/hadoop";
Configuration hConf = new Configuration();
hConf.addResource(new Path(hadoopLocalPath + File.separator + "core-site.xml"));
hConf.addResource(new Path(hadoopLocalPath + File.separator + "hdfs-site.xml"));
try {
fs = FileSystem.get(URI.create("hdfs://10.0.0.1:54310/"), hConf);
} catch (IOException e1) {
e1.printStackTrace();
System.exit(-1);
}
Path startPath = new Path("/user/myuser/path/to/my/file.txt");
FileStatus[] fileStatus;
try {
fileStatus = fs.listStatus(startPath);
Path[] paths = FileUtil.stat2Paths(fileStatus);
for(Path path : paths) {
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(path)));
String line = new String();
while ((line = br.readLine()) != null) {
System.out.println(line);
}
br.close();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
程序可以正确访问HDFS(没有异常上升)。如果我要求通过代码列出文件和目录,它可以毫无问题地读取它们。
现在,问题在于,如果我尝试读取文件(如显示的代码中),它会在阅读时(暂时)卡住,直到它升起BlockMissingException
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-2005327120-10.1.1.55-1467731650291:blk_1073741836_1015 file=/user/myuser/path/to/my/file.txt
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:888)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:568)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
at java.io.DataInputStream.read(DataInputStream.java:149)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at uk.ou.kmi.med.datoolkit.tests.access.HDFSAccessTest.main(HDFSAccessTest.java:55)
我已经知道的事情:
答案 0 :(得分:0)
您是否可以确保客户端也可以访问Datanode?连接AWS中配置的Hadoop时,我遇到了类似的问题。我能够通过符合所有数据节点和我的客户端系统之间的连接来解决问题