实际上我想在hadoop流中执行我的matlab代码。我的疑问是如何使用hadoop流输入参数值为我的matlab输入。例如,
这是我的matlab文件imreadtest.m(简单编码)
rgbImage = imread('/usr/new.jpg');
imwrite(rgbImage,'/usr/OT/testedimage1.jpg');
我的shell脚本是
#!/bin/sh
matlabbg imreadtest.m -nodisplay
通常这在我的ubuntu中运行良好。 (不在hadoop中)。我已经使用hue将这两个文件存储在我的HDFS中。现在我的matlab脚本看起来像(imrtest.m)
rgbImage = imread(STDIN);
imwrite(rgbImage,STDOUT);
我的shell脚本是(imrtest.sh)。
#!/bin/sh
matlabbg imrtest.m -nodisplay
我试图在hadoop流中执行此操作
hadoop@xxx:/usr/local/master/hadoop$ $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar -mapper /usr/OT/imrtest.sh -file /usr/OT/imrtest.sh -input /usr/OT/testedimage.jpg -output /usr/OT/opt
但是我得到了这样的错误
packageJobJar: [/usr/OT/imrtest.sh, /usr/local/master/temp/hadoop- unjar4018041785380098978/] [] /tmp/streamjob7077345699332124679.jar tmpDir=null
14/03/06 15:51:41 WARN snappy.LoadSnappy: Snappy native library is available
14/03/06 15:51:41 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/03/06 15:51:41 INFO snappy.LoadSnappy: Snappy native library loaded
14/03/06 15:51:41 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/06 15:51:42 INFO streaming.StreamJob: getLocalDirs(): [/usr/local/master/temp/mapred/local]
14/03/06 15:51:42 INFO streaming.StreamJob: Running job: job_201403061205_0015
14/03/06 15:51:42 INFO streaming.StreamJob: To kill this job, run:
14/03/06 15:51:42 INFO streaming.StreamJob: /usr/local/master/hadoop/bin/hadoop job -Dmapred.job.tracker=slave3:8021 -kill job_201403061205_0015
14/03/06 15:51:42 INFO streaming.StreamJob: Tracking URL: http://slave3:50030/jobdetails.jsp?jobid=job_201403061205_0015
14/03/06 15:51:43 INFO streaming.StreamJob: map 0% reduce 0%
14/03/06 15:52:15 INFO streaming.StreamJob: map 100% reduce 100%
14/03/06 15:52:15 INFO streaming.StreamJob: To kill this job, run:
14/03/06 15:52:15 INFO streaming.StreamJob: /usr/local/master/hadoop/bin/hadoop job -Dmapred.job.tracker=slave3:8021 -kill job_201403061205_0015
14/03/06 15:52:15 INFO streaming.StreamJob: Tracking URL: http://slave3:50030/jobdetails.jsp?jobid=job_201403061205_0015
14/03/06 15:52:15 ERROR streaming.StreamJob: Job not successful. Error: NA
14/03/06 15:52:15 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
此作业的jobtracker错误日志是
HOST=null
USER=hadoop
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Thu Mar 06 15:51:51 IST 2014
java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:297)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:110)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.streaming.Pipe
java.io.IOException: log:null
.
.
.
请建议我如何从我的matlab脚本输入的hadoop流输入中获取输入,同样输出。