Question

我写了一个关于字数概念map.py和reduce.py程序的程序。我已经成功运行了分别执行的程序。但最后一步执行不成功。我得到了错误（意外的事情）。我怎么能解决这个问题。我正在上传我的map.py，reduce.py程序和错误声明。

map.py：

import sys
for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print '%s\t%s' % (word, "1")

reduce.py：

import sys
c_count = {}
for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)   
    try:
        count = int(count)
    except ValueError:
        continue    
    try:
        c_count[word] = c_count[word]+count
    except:
    c_count[word] = count

for word in c_count.keys():
    print '%s\t%s'% ( word, c_count[word] )

日志如下：

18/02/14 09:47:34 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
18/02/14 09:47:35 INFO mapred.LocalJobRunner: hdfs://localhost:54310/s_data/wc_input/name.txt:0+115 > map
18/02/14 09:47:36 INFO mapreduce.Job:  map 67% reduce 0%
18/02/14 09:47:38 INFO mapred.LocalJobRunner: hdfs://localhost:54310/s_data/wc_input/name.txt:0+115 > map
/home/babu/./map1.py: 4: /home/babu/./map1.py: Syntax error: word unexpected (expecting "do")
18/02/14 09:51:56 INFO streaming.PipeMapRed: MRErrorThread done
18/02/14 09:51:56 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
18/02/14 09:51:57 INFO mapred.LocalJobRunner: map task executor complete.
18/02/14 09:51:57 WARN mapred.LocalJobRunner: job_local771131044_0001
java.lang.Exception: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
18/02/14 09:51:58 INFO mapreduce.Job: Job job_local771131044_0001 failed with state FAILED due to: NA
18/02/14 09:51:58 INFO mapreduce.Job: Counters: 22
    File System Counters
        FILE: Number of bytes read=1124
        FILE: Number of bytes written=287165
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=115
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=5
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=1
    Map-Reduce Framework
        Map input records=1
        Map output records=0
        Map output bytes=0
        Map output materialized bytes=0
        Input split bytes=99
        Combine input records=0
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=231735296
    File Input Format Counters 
        Bytes Read=115
18/02/14 09:51:58 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

Answer 1

您可以尝试在map.py和reduce.py的第一行添加＃！/ usr / bin / python 。

语法错误：在hadoop中出现意外（期待“do”）map.py

1 个答案: