在Hadoop-Streaming应用程序中运行Python脚本时出错

时间:2013-10-07 16:22:52

标签: python hadoop-streaming

我在Cygwin环境下的Win 7笔记本上运行Chuck Lam的'Hadoop in Action'中给出的Hadoop示例应用程序。 Python安装在Cygwin上并运行示例python应用程序。当我运行hadoop流应用程序时,它会抛出以下错误。以下是命令

"bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=1 -input input/cite75_99.txt -output output -mapper 'RandomSample.py 10' -file RandomSample.py

RandomSample.py是过滤输入的简单应用程序。

抛出以下错误:

java.io.IOException: Cannot run program "C:\cygwin64\home\RajS1\hadoop-1.2.1\.\RandomSample.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:376)
        at java.lang.ProcessImpl.start(ProcessImpl.java:136)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)

当我运行以下命令时,它也会抛出类似的错误。我猜流媒体应用程序应该执行python应用程序,但它试图将其作为java应用程序执行 请建议解决方案。提前致谢

1 个答案:

答案 0 :(得分:2)

您可能也想尝试此选项

bin / hadoop jar contrib / streaming / hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks = 1 -input input / cite75_99.txt -output output -mapper&#39; python RandomSample .py 10&#39; -file RandomSample.py

如果您尝试这样做,可能不会收到此错误。但是你可能会被访问​​拒绝&#39; python RandomSample.py&#39; ,尝试给出python exe的完整路径。 像

bin / hadoop jar contrib / streaming / hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks = 1 -input input / cite75_99.txt -output output -mapper&#39; c: \ mfiles \ python RandomSample.py 10&#39; -file RandomSample.py

祝你好运