使用Hadoop Streaming运行程序时出现NLTK错误

时间:2018-01-04 14:38:19

标签: python hadoop nltk hadoop-streaming wordnet

当我使用Hadoop流程运行程序时出错,导致此错误的原因来自库NLTK,我做了一个简单的测试,我收到了错误

import nltk 
from nltk.corpus import wordnet as wn 
if(wn.synsets('dog')):
     print 'something'

但是如果我在本地运行这个程序而不是使用Hadoop Streaming,它运行没有任何问题。

注意:我想在我的Map / reduce程序中使用此代码。

以下是错误的痕迹:

18/01/04 06:26:49 INFO mapreduce.Job: Job job_1515067732811_0013 running in uber mode : false
18/01/04 06:26:49 INFO mapreduce.Job:  map 0% reduce 0%
18/01/04 06:27:02 INFO mapreduce.Job:  map 50% reduce 0%
18/01/04 06:27:03 INFO mapreduce.Job:  map 100% reduce 0%
18/01/04 06:27:12 INFO mapreduce.Job: Task Id : attempt_1515067732811_0013_r_000000_0, Status : FAILED


18/01/04 06:27:12 INFO mapreduce.Job: Task Id : attempt_1515067732811_0013_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

18/01/04 06:27:23 INFO mapreduce.Job: Task Id : attempt_1515067732811_0013_r_000000_1, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538) at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

18/01/04 06:27:34 INFO mapreduce.Job: Task Id : attempt_1515067732811_0013_r_000000_2, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538) at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

0 个答案:

没有答案