使用Pig 0.10.1,我有以下脚本:
br = LOAD 'cfs:///somefile';
SPLIT br INTO s0 IF (sp == 1), not_s0 OTHERWISE;
SPLIT not_s0 INTO s1 IF (adp >= 1.0), not_s1 OTHERWISE;
SPLIT not_s1 INTO s2 IF (p > 1L), not_s2 OTHERWISE;
SPLIT not_s2 INTO s3 IF (s > 0L), s4 OTHERWISE;
tmp0 = FOREACH s0 GENERATE b, 'x' as seg;
tmp1 = FOREACH s1 GENERATE b, 'y' as seg;
tmp2 = FOREACH s2 GENERATE b, 'z' as seg;
tmp3 = FOREACH s3 GENERATE b, 'w' as seg;
tmp4 = FOREACH s4 GENERATE b, 't' as seg;
out = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;
dump out;
br
中加载的文件是由以前的Pig脚本生成的,并且具有嵌入式架构(.pig_schema文件):
describe br
br: {b: chararray,p: long,afternoon: long,ddv: long,pa: long,t0002: long,t0204: long,t0406: long,t0608: long,t0810: long,t1012: long,t1214: long,t1416: long,t1618: long,t1820: long,t2022: long,t2200: long,browser_software: chararray,first_timestamp: long,last_timestamp: long,os: chararray,platform: chararray,sp: int,adp: double}
从上面编辑了一些不相关的字段(我目前无法完全公开数据的性质)。
脚本失败,出现以下错误:
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Integer cannot be cast to java.lang.Long
但是,转储s0
,s1
,s2
,s3
,s4
或tmp0
,tmp1
,{{ 1}} tmp2
,tmp3
完美无缺。
Hadoop作业跟踪器显示以下错误4次:
tmp4
我也试过这个代码段(而不是原来的java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.doComparison(EqualToExpr.java:116)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.getNext(EqualToExpr.java:83)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
):
dump
我得到一个不同的(但我认为相关)错误:
x = UNION s1,s2;
y = FOREACH x GENERATE b;
dump y;
作业跟踪器错误(重复4次):
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Double cannot be cast to java.lang.Long
我试图找到涉及工会的已知错误而没有运气。这真是令人费解。想法?
答案 0 :(得分:1)
进一步挖掘后,看起来这是一个错误。我创建了一个ticket for it。
答案 1 :(得分:0)
当你在两个或多个关系之间执行联合操作时,我们应该处理字段的数据类型。
由于数据类型不兼容而引发了上述问题。为避免这种情况,请将chararray声明为bytearray.you将消除此错误。