我正在使用以下文件从文件中读取字节:
FileSystem fs = config.getHDFS();
try {
Path path = new Path(dirName + '/' + fileName);
byte[] bytes = new byte[(int)fs.getFileStatus(path)
.getLen()];
in = fs.open(path);
in.read(bytes);
result = new DataInputStream(new ByteArrayInputStream(bytes));
} catch (Exception e) {
e.printStackTrace();
if (in != null) {
try {
in.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
}
我正在阅读的目录中有大约15,000个文件。在某一点之后,我在in.read(bytes)行上得到了这个异常:
2012-05-31 14:11:45,477 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:115)
at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:427)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:725)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514)
at java.io.DataInputStream.read(DataInputStream.java:83)
抛出的另一个异常是:
2012-05-31 15:09:14,849 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue
java.net.SocketException: No buffer space available (maximum connections reached?): connect
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:719)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514)
at java.io.DataInputStream.read(DataInputStream.java:83)
请告知可能出现的问题。
答案 0 :(得分:3)
您忽略了in.read
的返回值,并假设您可以一次性读取整个文件。不要那样做。循环,直到read
返回-1,或者您已经读取了所需的数据。我不清楚你是否真的应该像这样信任getLen()
- 如果文件在两次调用之间增长(或收缩)会发生什么?
我建议创建一个ByteArrayOutputStream
写入和一个小的(16K?)缓冲区作为临时存储,然后循环 - 读入缓冲区,将多个字节写入输出流,泡沫,冲洗,重复直到read
返回-1表示流的结束。然后,您可以从ByteArrayOutputStream
中获取数据,并像以前一样将其放入ByteArrayInputStream
。
编辑:快速代码,未经测试 - Guava中有类似(更好)的代码,顺便说一句。
public static byte[] readFully(InputStream stream) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[16 * 1024];
int bytesRead;
while ((bytesRead = stream.read(buffer)) > 0) {
baos.write(buffer, 0, bytesRead);
}
return baos.toByteArray();
}
然后使用:
in = fs.open(path);
byte[] data = readFully(in);
result = new DataInputStream(new ByteArrayInputStream(data));
另请注意,您应该在finally
块中关闭您的流,而不仅仅是例外。我还建议不要抓住Exception
本身。