Question

我读到Hadoop HDFS中没有随机读写。但是，DFSOutputStream中写入的参数是

void write(byte buf[], int off, int len)
void write(int b)

同样，DFSInputStream中的读数参数是

int read(byte buf[], int off, int len)

int read()

在HDFS的读/写调用中都可以看到 OffSet参数。如果MapReduce框架仅用于在最后位置添加数据，为什么需要？ HDFS中如何使用“offset”参数？ HDFS写入是否总是只附加？

Answer 1

参数 int off 不代表输入文件中的随机点。它实际上是byte [] 中的偏移量，其中数据将被写入byte []内，直到 len 字节数。例如，假设您已经写过

byte buf[15]; read(buf, 5, 10);

这将从文件的开始输入文件和而不是从第5个字节读取数据。但是 buf [] 数组将从第5个字节填充到最后一个字节（5 + 10）。

要进行交叉检查，您可以为参数关闭使用一些不同的值。无论您为关闭提供什么价值，都会始终从文件的开头读取数据（如果您没有明确使用过搜索）。

这里要注意的一点是，数组的大小不得低于+ len 。

运行此示例以获得清晰的理解：

public class ReadHdfsFile { public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/core-site.xml")); conf.addResource(new Path("/Users/miqbal1/hadoop-eco/hadoop-1.1.2/conf/hdfs-site.xml")); FileSystem fs = FileSystem.get(conf); FSDataInputStream in = fs.open(new Path("/demo.txt")); //Filling the array b1 from the 5th byte int charPos = 0; byte[] b1 = new byte[10]; int bytesRead = in.read(b1, 5, 5); System.out.println("Bytes Read : " + bytesRead); String s = new String(b1, "UTF-8"); System.out.println("Printing char by char(you'll see first 5 bytes as blank)..."); for(char c : s.toCharArray()){ System.out.println("Character " + ++charPos + " : " + c); } System.out.println(); System.out.println("Changing offset value...."); //Filling the array b2 from the 10th byte in.seek(0); charPos = 0; byte[] b2 = new byte[15]; bytesRead = in.read(b2, 10, 5); System.out.println("Bytes Read : " + bytesRead); s = new String(b2, "UTF-8"); System.out.println("Printing char by char(you'll see first 10 bytes as blank)..."); for(char c : s.toCharArray()){ System.out.println("Character " + ++charPos + " : " + c); } System.out.println("DONE!!!"); in.close(); fs.close(); } }

HTH

Answer 2

“ bytesRead = in.read（b2，10，5）;”只是FSDataInputStream的一个接口。 in.read（postion，buffer，offset，len）中的另一个接口支持随机读取。您还可以参考TestDFSIO随机读取案例。

HDFS不支持按契约随机写入。

在HDFS中随机读/写

2 个答案: