我有一个包含5条记录的avro文件。我知道你可以编写一个mapreduce作业来遍历每条记录,但是我的java mapreduce作业映射器中有一种方法可以获得" length"文件中的avro,以便我可以得到:
1) the starting position of each record as they are processed.
2) the length of the record as it exists in the file, such that I can use java code to "seek" to the start of a specific avro record within the file (i.e. 4th record).
如果根据当前的Avro库无法做到这一点,那很好。
用例是我希望能够输出包含以下内容的文件:
<Record Number> <StartIndex> <EndIndex>
Record1 0 150
Record2 151 270
...