所以我尝试使用python / matplotlib通过apache pig绘制一些数据。
具体来说,我想使用pig读取和处理数据,然后通过用python编写的绘图脚本来流式传输。
我已经在apache pig之外使用了绘图脚本一段时间而没有发生任何事故所以我相当肯定这不是问题,但如果有人想要我,我可以发布它。
现在我的猪脚本。
%default BINSIZE 5.0
/* functions */
define plot `test_plot.py -f output_image.png` ship('/tank/user/eric/dev/pig/test_plot.py');
/* load the data */
cd /scratch;
VALUE = load 'test_data.txt' as (x_val:double);
/* bin the data */
BINNED_VAL = foreach VALUE
generate (double)((int)( x_val / $BINSIZE )) * $BINSIZE;
/* make a histogram */
COUNTED = group BINNED_VAL by $0;
HIST = foreach COUNTED generate group, COUNT(BINNED_VAL);
A = stream HIST through plot;
dump A;
test_plot.py的-f
标志指定输出文件。该脚本从stdin读取但不写入stdout,因此A实际上从未设置为任何内容,这意味着dump A
实际上并没有做任何事情。 (确实会引发错误)。
这里是test_data.txt的内容:
5
5
6
6.5
8
12
28
25
25
25
26
29
32
35
这是我收到的错误消息:
2014-07-07 12:49:30,973 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
2014-07-07 12:49:30,973 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-07-07 12:49:30,974 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.0.2.1.2.1-471 0.12.1.2.1.2.1-471 eric 2014-07-07 12:48:57 2014-07-07 12:49:30 GROUP_BY,STREAMING
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1404713698289_0021 A,BINNED_VAL,COUNTED,HIST,VALUE GROUP_BY,STREAMING,COMBINER Message: Job failed! hdfs://hypno.st.hmc.edu:8020/tmp/temp-2122498041/tmp461187682,
Input(s):
Failed to read data from "hdfs://hypno.st.hmc.edu:8020/scratch/test_data.txt"
Output(s):
Failed to produce result in "hdfs://hypno.st.hmc.edu:8020/tmp/temp-2122498041/tmp461187682"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1404713698289_0021
2014-07-07 12:49:30,974 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-07-07 12:49:30,986 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
Details at logfile: /tank/user/eric/dev/pig/pig_1404762535492.log
这里是输出日志文件:
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_m_000000_0 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_0 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_0 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_1 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_1 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_2 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_2 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_3 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Error message from task (reduce) task_1404713698289_0021_r_000000
-----------------------------------------------------------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
Error message from task (reduce) task_1404713698289_0021_r_000000
-----------------------------------------------------------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
Error message from task (reduce) task_1404713698289_0021_r_000000
-----------------------------------------------------------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
Error message from task (reduce) task_1404713698289_0021_r_000000
-----------------------------------------------------------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
Pig Stack Trace
---------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.PigServer.openIterator(PigServer.java:872)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:607)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
我的猪版本为Apache Pig version 0.12.1.2.1.2.1-471
,我正在使用Python 2.6.6
。
我对猪也很陌生,所以如果我错过了一些愚蠢的话我会道歉。
如果有人能指出我正确的方向,我将不胜感激。 :)