执行的命令: hadoop jar /usr/local/hadoop-2.6.0/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -input /user/hduser/samples/x.txt-output/user / hduser / samples / hadoop_output_data1 -mapper mapper2.py -reducer reducer2.py -file mapper2.py -file reducer2.py
15/10/05 17:37:04 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
packageJobJar: [mapper2.py, reducer2.py] [] /tmp/streamjob1958064018029021376.jar tmpDir=null
15/10/05 17:37:04 INFO client.RMProxy: Connecting to ResourceManager at Hadoop1/192.168.10.2:8050
15/10/05 17:37:04 INFO client.RMProxy: Connecting to ResourceManager at Hadoop1/192.168.10.2:8050
15/10/05 17:37:06 INFO mapred.FileInputFormat: Total input paths to process : 1
15/10/05 17:37:07 INFO mapreduce.JobSubmitter: number of splits:2
15/10/05 17:37:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1444080901412_0001
15/10/05 17:37:08 INFO impl.YarnClientImpl: Submitted application application_1444080901412_0001
15/10/05 17:37:08 INFO mapreduce.Job: The url to track the job: http://Hadoop1:8088/proxy/application_1444080901412_0001/
15/10/05 17:37:08 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hduser/mapred/local]
15/10/05 17:37:08 INFO streaming.StreamJob: Running job: job_1444080901412_0001
15/10/05 17:37:08 INFO streaming.StreamJob: Job running in-process (local Hadoop)
15/10/05 17:37:09 INFO streaming.StreamJob: map 0% reduce 0%
15/10/05 17:37:45 INFO streaming.StreamJob: map 100% reduce 100%
15/10/05 17:37:46 INFO streaming.StreamJob: Job running in-process (local Hadoop)
15/10/05 17:37:46 ERROR streaming.StreamJob: Job not successful. Error: Task failed task_1444080901412_0001_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/10/05 17:37:46 INFO streaming.StreamJob: killJob...
15/10/05 17:37:46 INFO impl.YarnClientImpl: Killed application application_1444080901412_0001
Streaming Command Failed!
映射文件:
import sys
# input comes from STDIN (standard input)
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split(',')
if len(words) == 6:
key = (',').join([words[0], words[1], words[2], words[3],words[4]])
value = (',').join([words[5], "1"])
print '%s\t%s'%(key, value)
示例输入文件:
10.10.1.22,10.10.1.13,0,23772,6,9900
10.10.1.12,10.10.1.21,55570,0,6,9900
10.10.1.22,10.10.1.13,0,24028,6,9900
10.10.1.21,10.10.1.12,0,46864,6,9900
10.10.1.12,10.10.1.21,56594,0,6,9900
10.10.1.22,10.10.1.13,0,25308,6,9900
10.10.1.12,10.10.1.21,57618,0,6,9900
10.10.1.21,10.10.1.12,0,48144,6,9900
10.10.1.22,10.10.1.13,0,25564,6,9900
10.10.1.12,10.10.1.21,58642,0,6,9900
10.10.1.22,10.10.1.13,0,26844,6,9900
10.10.1.21,10.10.1.12,0,48400,6,9900
10.10.1.12,10.10.1.21,59410,0,6,9900
mapred-site.xml中
<configuration>
<property>
<name>mapreduce.job.tracker</name>
<value>Hadoop1:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
</configuration>
namenode的hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop-2.6.0/hadoop_data/hdfs/namenode</value>
</property>
</configuration>
纱-site.xml中
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Hadoop1:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Hadoop1:8035</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Hadoop1:8050</value>
</property>
</configuration>
芯-site.xml中
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://Hadoop1:9000</value>
</property>
</configuration>