猪:关系和模式名称混淆

时间:2015-01-31 16:15:26

标签: hadoop mapreduce apache-pig bigdata

在Pig Latin中;这可以按预期工作:

filtered = FILTER records BY age > 27;

但这引发了异常(当>> DUMP过滤时):

filtered = FILTER records BY records.age > 27;

这是例外:

java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (John,Wilk,27,M), 2nd :(Tri,Tim,27,F)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (John,Wilk,27,M), 2nd :(Tri,Tim,27,F)
    at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:119)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:345)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextInteger(POUserFunc.java:394)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr.getNextBoolean(GreaterThanExpr.java:74)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:144)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

两者有什么区别?它们不一样吗?

1 个答案:

答案 0 :(得分:2)

不,两个人都不一样。

  1. 第一个stmt完全有效,在这种情况下,pig将遍历每一行并应用过滤器约束(age> 27)。它是使用过滤器stmts的标准方法。

  2. 在第二种情况下,您使用dereference operator(.)来访问字段,但是当您使用取消引用运算符时,取消引用运算符主要用于访问复杂数据类型(元组,包和地图)值为了访问字段,猪将always expect the scalar output(即过滤条件后只有一个输出),不幸的是你的过滤条件(年龄> 27)返回more than one matching result,这就是你得到的原因" Scalar has more than one row in the output" 如果您的过滤条件(年龄> 27)仅返回一个输出,那么您的stmt完全有效。