Question

您是否可以建议在运行MapReduce时解决FixedInputFormat中出现的错误：Partial record found at the end of split

我正在分析为Hive定制FileInputFormat并正在研究以下github中的FixedInputFormat： https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input

我复制了FixedInputFormat和FixedRecordReader，并创建了一个Mapper和Driver程序来测试它（0 reducer）。

在Mapper中调用此FixedInputFormat，如下所示：

Configuration conf = new Configuration(true);
conf.set("fs.default.name", "file:///");
conf.setInt("fixedlengthinputformat.record.length",50);
job.setInputFormatClass(FixedLengthInputFormat.class);

数据文件如下所示（使用3条记录进行测试）：

   000yyy022222222xxxxxxx                       11111

splitsize计算为152而不是150，我得到以下错误：

java.lang.Exception：java.io.IOException：在分割结束时找到的部分记录（长度= 2）。 INFO customFixed.FixedLengthRecordReader：在分割中期望4条记录，每条记录长度为50字节，有效大小为152字节

我在Windows中运行Intellij进行分析。

这种做法有什么不对吗？你能否建议解决这个错误，我将不胜感激。

谢谢。

Answer 1

请查看github中提供的FixedInputFormat代码。

基本标准是每条记录的长度应相同。这意味着你文件中的每条记录的长度应为“fixedlengthinputformat.record.length”。

请验证输入文件，我确信你的一条记录超过50只是精确的52

Record Reader一次读取50个字节，最后如果留下2个字节，则无法将其解释为有效记录。

Hadoop- MapReduce - FixedInputFormat：java.io.IOException：在分割结束时找到的部分记录

1 个答案: