我遇到此错误问题流命令失败!当我尝试对某些数据运行mapper.py和reducer.py时发生错误。mapper和reducer运行,但流传输失败。
这是映射器代码
#!/usr/bin/python
import sys
for line in sys.stdin:
data = line.strip().split(",")
key = data[0]
value = 1
print ("{0}\t{1}".format(key, value) )
这是减速器代码
#!/usr/bin/python
import sys
total = 0
oldkey = None
for line in sys.stdin:
data = line.strip().split("\t")
thiskey = data[0]
value = data[1]
if thiskey != oldkey and oldkey != None:
print ("{0}\t{1}".format(oldkey, total))
oldkey = thiskey
total = 0
oldkey = thiskey
total += float(value)
if oldkey != None:
print ("{0}\t{1}".format(oldkey, total))
这是Im在终端运行的命令,用于数据的映射器和化简器。
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar -file ./mapper.py -file ./reducer.py -mapper mapper.py -reducer reducer.py -input /usr/bda-p101234/airline_data.csv -output /usr/bda-p101234/query1_output
2020-06-23 17:05:39,399 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [./mapper.py, ./reducer.py, /tmp/hadoop-unjar2898662668096241827/] [] /tmp/streamjob7927337112790687471.jar tmpDir=null
2020-06-23 17:05:40,209 INFO client.RMProxy: Connecting to ResourceManager at lmar/192.168.18.100:8032
2020-06-23 17:05:40,379 INFO client.RMProxy: Connecting to ResourceManager at lmar/192.168.18.100:8032
2020-06-23 17:05:40,736 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/bda-p190311/.staging/job_1592885073926_0001
2020-06-23 17:05:40,885 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 17:05:41,585 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 17:05:41,627 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 17:05:41,713 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/bda-p190311/.staging/job_1592885073926_0001
2020-06-23 17:05:41,736 ERROR streaming.StreamJob: Error Launching job : Input path does not exist: hdfs://lmar:9000/usr/bda-p101234/airline_data.csv
Streaming Command Failed!
我在这方面做错了什么?请帮忙!