为什么赢得我简单的Pig UDF运行?

时间:2016-11-13 22:06:54

标签: python apache-pig udf

我正在尝试使用非常简单的Python UDF。我会把所有东西都放在那里,希望有人能看到我不知道的东西。

test.csv文件:

john,18,a
bob,20,f
mary,19,q
jill,21,m

test.py文件:

from pig_util import outputSchema

@outputSchema("squareSchema:int")
def square():
  return 5

我的猪代码:

A = LOAD 'hdfs:/home/ubuntu/pigtest/test.csv' USING PigStorage(',') AS (name:chararray, age:int, letter:chararray);
Register 'test.py' using streaming_python as tester;
answer = FOREACH A GENERATE tester.square(), name;
dump A;
dump answer;

转储A命令按预期工作,我得到了我期望的一切。 DESCRIBE A给出了以下内容:

A: {name: chararray,age: int,letter: chararray}

转储答案不起作用,我把它简化到了我不知道问题是什么的地方。运行转储答案后的我的日志文件;命令如下:

2016-11-13 19:21:55,118 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2016-11-13 19:21:55,149 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-13 19:21:55,151 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-11-13 19:21:55,152 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-11-13 19:21:55,153 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-11-13 19:21:55,163 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for A: $1, $2
2016-11-13 19:21:55,170 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-11-13 19:21:55,171 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-11-13 19:21:55,172 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-11-13 19:21:55,182 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-13 19:21:55,184 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2016-11-13 19:21:55,185 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-11-13 19:21:55,186 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-11-13 19:21:55,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2016-11-13 19:21:55,660 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.15.0-core-h2.jar to DistributedCache through /tmp/temp-654495955/tmp1312094444/pig-0.15.0-core-h2.jar
2016-11-13 19:21:55,679 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-654495955/tmp892215964/automaton-1.11-8.jar
2016-11-13 19:21:55,703 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-654495955/tmp1940980863/antlr-runtime-3.4.jar
2016-11-13 19:21:55,731 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.5.jar to DistributedCache through /tmp/temp-654495955/tmp1550969813/joda-time-2.5.jar
2016-11-13 19:21:55,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/tmp/PigScriptUDF-0d8d30b39cd16119ea7d9b09022d44a8.jar to DistributedCache through /tmp/temp-654495955/tmp-24373919/PigScriptUDF-0d8d30b39cd16119ea7d9b09022d44a8.jar
2016-11-13 19:21:55,788 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-11-13 19:21:55,792 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2016-11-13 19:21:55,793 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2016-11-13 19:21:55,793 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2016-11-13 19:21:55,816 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-11-13 19:21:55,818 [JobControl] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2016-11-13 19:21:55,832 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2016-11-13 19:21:55,876 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-11-13 19:21:55,877 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-11-13 19:21:55,880 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-11-13 19:21:55,888 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2016-11-13 19:21:55,952 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local55140129_0002
2016-11-13 19:21:56,266 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1479086516035/pig-0.15.0-core-h2.jar <- /home/ubuntu/pig-0.15.0-core-h2.jar
2016-11-13 19:21:56,268 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:54310/tmp/temp-654495955/tmp1312094444/pig-0.15.0-core-h2.jar as file:/app/hadoop/tmp/mapred/local/1479086516035/pig-0.15.0-core-h2.jar
2016-11-13 19:21:56,268 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1479086516036/automaton-1.11-8.jar <- /home/ubuntu/automaton-1.11-8.jar
2016-11-13 19:21:56,271 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:54310/tmp/temp-654495955/tmp892215964/automaton-1.11-8.jar as file:/app/hadoop/tmp/mapred/local/1479086516036/automaton-1.11-8.jar
2016-11-13 19:21:56,271 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1479086516037/antlr-runtime-3.4.jar <- /home/ubuntu/antlr-runtime-3.4.jar
2016-11-13 19:21:56,274 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:54310/tmp/temp-654495955/tmp1940980863/antlr-runtime-3.4.jar as file:/app/hadoop/tmp/mapred/local/1479086516037/antlr-runtime-3.4.jar
2016-11-13 19:21:56,275 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1479086516038/joda-time-2.5.jar <- /home/ubuntu/joda-time-2.5.jar
2016-11-13 19:21:56,277 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:54310/tmp/temp-654495955/tmp1550969813/joda-time-2.5.jar as file:/app/hadoop/tmp/mapred/local/1479086516038/joda-time-2.5.jar
2016-11-13 19:21:56,278 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1479086516039/PigScriptUDF-0d8d30b39cd16119ea7d9b09022d44a8.jar <- /home/ubuntu/PigScriptUDF-0d8d30b39cd16119ea7d9b09022d44a8.jar
2016-11-13 19:21:56,280 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:54310/tmp/temp-654495955/tmp-24373919/PigScriptUDF-0d8d30b39cd16119ea7d9b09022d44a8.jar as file:/app/hadoop/tmp/mapred/local/1479086516039/PigScriptUDF-0d8d30b39cd16119ea7d9b09022d44a8.jar
2016-11-13 19:21:56,350 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1479086516035/pig-0.15.0-core-h2.jar
2016-11-13 19:21:56,351 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1479086516036/automaton-1.11-8.jar
2016-11-13 19:21:56,351 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1479086516037/antlr-runtime-3.4.jar
2016-11-13 19:21:56,352 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1479086516038/joda-time-2.5.jar
2016-11-13 19:21:56,352 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1479086516039/PigScriptUDF-0d8d30b39cd16119ea7d9b09022d44a8.jar
2016-11-13 19:21:56,353 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
2016-11-13 19:21:56,353 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local55140129_0002
2016-11-13 19:21:56,354 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,answer
2016-11-13 19:21:56,354 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],answer[-1,-1] C:  R:
2016-11-13 19:21:56,358 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-11-13 19:21:56,359 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local55140129_0002]
2016-11-13 19:21:56,359 [Thread-103] INFO  org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
2016-11-13 19:21:56,366 [Thread-103] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2016-11-13 19:21:56,367 [Thread-103] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-13 19:21:56,367 [Thread-103] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-11-13 19:21:56,368 [Thread-103] INFO  org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2016-11-13 19:21:56,387 [Thread-103] INFO  org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2016-11-13 19:21:56,387 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local55140129_0002_m_000000_0
2016-11-13 19:21:56,425 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.Task -  Using ResourceCalculatorProcessTree : [ ]
2016-11-13 19:21:56,434 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 25
Input split[0]:
   Length = 25
   ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
   Locations:

-----------------------

2016-11-13 19:21:56,442 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://localhost:54310/home/ubuntu/pigtest/test.csv:0+25
2016-11-13 19:21:56,473 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-11-13 19:21:56,479 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[1,4],answer[-1,-1] C:  R:
2016-11-13 19:21:56,484 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2016-11-13 19:21:56,485 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.id is deprecated. Instead, use mapreduce.job.id
2016-11-13 19:21:56,485 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2016-11-13 19:21:56,526 [Thread-103] INFO  org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2016-11-13 19:21:56,533 [Thread-103] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local55140129_0002
java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: LINE : KeyError: 'square'



        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : KeyError: 'square'



        at org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:503)
2016-11-13 19:22:01,364 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2016-11-13 19:22:01,365 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local55140129_0002 has failed! Stop running all dependent jobs
2016-11-13 19:22:01,365 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-11-13 19:22:01,366 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2016-11-13 19:22:01,367 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2016-11-13 19:22:01,368 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2016-11-13 19:22:01,369 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.6.2   0.15.0  ubuntu  2016-11-13 19:21:55     2016-11-13 19:22:01     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local55140129_0002  A,answer        MAP_ONLY        Message: Job failed!   hdfs://localhost:54310/tmp/temp-654495955/tmp-1492296578,

Input(s):
Failed to read data from "hdfs:/home/ubuntu/pigtest/test.csv"

Output(s):
Failed to produce result in "hdfs://localhost:54310/tmp/temp-654495955/tmp-1492296578"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_local55140129_0002


2016-11-13 19:22:01,372 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-11-13 19:22:01,388 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias answer
Details at logfile: /home/ubuntu/pig_1479086479169.log

我认为错误就在这里:

    java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: LINE : KeyError: 'square'

但我无法弄清楚为什么不开心。在此先感谢您的帮助。

0 个答案:

没有答案