Hadoop流中的MapReduce输出的结果与bash命令的结果不同

时间:2019-06-02 17:27:02

标签: bash mapreduce hadoop-streaming

我安装了hadoop作为docker。 (https://hub.docker.com/r/sequenceiq/hadoop-docker

HDFS ls输入结果

$ bin/hdfs dfs -ls input
Found 31 items
-rw-r--r--   1 root supergroup       4436 2015-07-22 11:17 input/capacity-scheduler.xml
-rw-r--r--   1 root supergroup       1335 2015-07-22 11:17 input/configuration.xsl
-rw-r--r--   1 root supergroup        318 2015-07-22 11:17 input/container-executor.cfg
-rw-r--r--   1 root supergroup        155 2015-07-22 11:17 input/core-site.xml
-rw-r--r--   1 root supergroup        154 2015-07-22 11:17 input/core-site.xml.template
-rw-r--r--   1 root supergroup       3670 2015-07-22 11:17 input/hadoop-env.cmd
-rw-r--r--   1 root supergroup       4302 2015-07-22 11:17 input/hadoop-env.sh
-rw-r--r--   1 root supergroup       2490 2015-07-22 11:17 input/hadoop-metrics.properties
-rw-r--r--   1 root supergroup       2598 2015-07-22 11:17 input/hadoop-metrics2.properties
-rw-r--r--   1 root supergroup       9683 2015-07-22 11:17 input/hadoop-policy.xml
-rw-r--r--   1 root supergroup        126 2015-07-22 11:17 input/hdfs-site.xml
-rw-r--r--   1 root supergroup       1449 2015-07-22 11:17 input/httpfs-env.sh
-rw-r--r--   1 root supergroup       1657 2015-07-22 11:17 input/httpfs-log4j.properties
-rw-r--r--   1 root supergroup         21 2015-07-22 11:17 input/httpfs-signature.secret
-rw-r--r--   1 root supergroup        620 2015-07-22 11:17 input/httpfs-site.xml
-rw-r--r--   1 root supergroup       3518 2015-07-22 11:17 input/kms-acls.xml
-rw-r--r--   1 root supergroup       1527 2015-07-22 11:17 input/kms-env.sh
-rw-r--r--   1 root supergroup       1631 2015-07-22 11:17 input/kms-log4j.properties
-rw-r--r--   1 root supergroup       5511 2015-07-22 11:17 input/kms-site.xml
-rw-r--r--   1 root supergroup      11237 2015-07-22 11:17 input/log4j.properties
-rw-r--r--   1 root supergroup        951 2015-07-22 11:17 input/mapred-env.cmd
-rw-r--r--   1 root supergroup       1383 2015-07-22 11:17 input/mapred-env.sh
-rw-r--r--   1 root supergroup       4113 2015-07-22 11:17 input/mapred-queues.xml.template
-rw-r--r--   1 root supergroup        138 2015-07-22 11:17 input/mapred-site.xml
-rw-r--r--   1 root supergroup        758 2015-07-22 11:17 input/mapred-site.xml.template
-rw-r--r--   1 root supergroup         10 2015-07-22 11:17 input/slaves
-rw-r--r--   1 root supergroup       2316 2015-07-22 11:17 input/ssl-client.xml.example
-rw-r--r--   1 root supergroup       2268 2015-07-22 11:17 input/ssl-server.xml.example
-rw-r--r--   1 root supergroup       2250 2015-07-22 11:17 input/yarn-env.cmd
-rw-r--r--   1 root supergroup       4567 2015-07-22 11:17 input/yarn-env.sh
-rw-r--r--   1 root supergroup       1525 2015-07-22 11:17 input/yarn-site.xml

下面的hadoop流命令。

$ bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.0.jar \
    -input input \
    -output test \
    -mapper /bin/cat \
    -reducer /usr/bin/wc
$ bin/hdfs dfs -cat test/*
   2060    7737   78618

和bash命令在这里。

$ bin/hdfs dfs -cat input/* | cat | sort | wc
    729    2588   25574

我认为这是与此链接无关的问题: Hadoop MapReduce Streaming output different from the output of running MapReduce locally

为什么结果不同?

0 个答案:

没有答案