流命令失败! HADOOP中的错误

时间:2020-06-23 12:19:15

标签: python hadoop mapreduce reducers mapper

我遇到此错误问题流命令失败!当我尝试对某些数据运行mapper.py和reducer.py时发生错误。mapper和reducer运行,但流传输失败。

这是映射器代码

#!/usr/bin/python

import sys

for line in sys.stdin:

    data = line.strip().split(",")
    key = data[0]
    value = 1
    print ("{0}\t{1}".format(key, value) )

这是减速器代码

#!/usr/bin/python
import sys
total = 0
oldkey = None

for line in sys.stdin:

    data = line.strip().split("\t")

    thiskey = data[0]

    value = data[1]

    if thiskey != oldkey and oldkey != None:

            print ("{0}\t{1}".format(oldkey, total))

            oldkey = thiskey

            total = 0

    oldkey = thiskey

    total += float(value)

if oldkey != None:

        print ("{0}\t{1}".format(oldkey, total))

这是Im在终端运行的命令,用于数据的映射器和化简器。

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar -file ./mapper.py -file ./reducer.py -mapper mapper.py -reducer reducer.py -input /usr/bda-p101234/airline_data.csv -output /usr/bda-p101234/query1_output

2020-06-23 17:05:39,399 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.

packageJobJar: [./mapper.py, ./reducer.py, /tmp/hadoop-unjar2898662668096241827/] [] /tmp/streamjob7927337112790687471.jar tmpDir=null

2020-06-23 17:05:40,209 INFO client.RMProxy: Connecting to ResourceManager at lmar/192.168.18.100:8032

2020-06-23 17:05:40,379 INFO client.RMProxy: Connecting to ResourceManager at lmar/192.168.18.100:8032

2020-06-23 17:05:40,736 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/bda-p190311/.staging/job_1592885073926_0001

2020-06-23 17:05:40,885 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-06-23 17:05:41,585 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-06-23 17:05:41,627 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-06-23 17:05:41,713 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/bda-p190311/.staging/job_1592885073926_0001

2020-06-23 17:05:41,736 ERROR streaming.StreamJob: Error Launching job : Input path does not exist: hdfs://lmar:9000/usr/bda-p101234/airline_data.csv

Streaming Command Failed!

我在这方面做错了什么?请帮忙!

0 个答案:

没有答案