Question

我不确定我是否理解TextInputFormat的工作方式。在文档中说：

纯文本文件的InputFormat。文件分为几行。

所以我假设当我简单地将我得到的值作为输入到我的map函数转换为String时，我的文件中会有一个字符串表示。

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

    String line = value.toString(); \\ one line of my input file?
    ...

    }

然而，在进一步处理该行时，事实证明它实际上不是我文件中的一行。我的文件city.dat看起来像这样：

Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51
Canillo|ad|Canillo|3292|42.57|1.6
...

任何人都可以告诉我如何在地图功能中将此文件的行处理起来吗？

Answer 1

TextInputFormat用作纯文本文件的InputFormat。文件分为几行。换行或回车用于发出行尾信号。键是文件中的位置，值是文本行。如果行尾不是换行符或回车符，你必须写自己的InputFormat。

查看此博客点号。 3它肯定会在行尾分解行。 http://blog.cloudera.com/blog/2011/01/lessons-learned-from-clouderas-hadoop-developer-training-course/

我建议通过将文件打开到像UltraEdit这样的TextEditor来检查你的文件并检查换行符。

看看它是否有帮助。

Hadoop Mapreduce：TextInputFormat和处理线？

1 个答案: