我的问题是这个。 Apache Hadoop,its documentation mentions中的以下示例代码可用于hadoop流:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc
现在,我向该流媒体提供文本文件。可以说,该文本文件仅包含以下两行:
This is line1
It becomes line2
hadoop流命令完美运行,没有问题。
但是,尽管我多次阅读以上链接的材料和Internet上的其他示例,但我无法回答以下问题。让我们说只有一个映射器和一个简化器:
我理解这些是非常基本的问题,但是我一次又一次地陷入困境,无法获得正确的答案。感谢您的帮助。
谢谢。
答案 0 :(得分:0)
In the case of the above two lines what would be the key and what would be the value.
The key is the offset of the line. The value is the entire line text
Mappers act on both keys and values
The output of the mapper will be the same, I believe, or at least just (null, line), for every line.
wc
would operate on every unique key, so if you get only one result as the output, then the input was likely (null, ["this line one", "it becomes line2"])
, and the list of values is counted