Pig Script工作在0.12.0但不在0.11.1上

时间:2013-12-06 02:21:46

标签: java hadoop mapreduce apache-pig

我已经写了这个猪脚本,它在版本0.12.0上完美运行,但是我无法在0.11.1上运行它 我无法确定真正缺少的是什么。

data = LOAD '<file_name>' USING PigStorage(',') AS 
(Year,Month:int,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,
    FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay:int,DepDelay,Origin,
    Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,
    WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay);
A = FILTER data BY (ArrDelay > 0);
X = GROUP A BY (Dest, Year, 
        (
            Case
                when Month>2 AND Month<6 THEN 'SPRING'
                when Month>5 AND Month<9 THEN 'SUMMER'
                when Month>8 AND Month<12 THEN 'FALL'
                when Month==12 OR (Month<3 AND Month>0) THEN 'WINTER'
            END
        )
    );
Y = FOREACH X GENERATE group.Dest, group.Year, group.$2, SUM(A.ArrDelay);
STORE Y INTO 'myoutput';

这是运行脚本时给出的异常。

Pig Stack Trace
---------------
ERROR 1200: <file DelayBySeasonPerYear.pig, line 7, column 16>  Syntax error, unexpected symbol at or near 'Dest'

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. <file DelayBySeasonPerYear.pig, line 7, column 16>  Syntax error, unexpected symbol at or near 'Dest'
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:604)
    at org.apache.pig.Main.main(Main.java:157)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: Failed to parse: <file DelayBySeasonPerYear.pig, line 7, column 16>  Syntax error, unexpected symbol at or near 'Dest'
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:235)
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:177)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
    ... 14 more

1 个答案:

答案 0 :(得分:2)

CASE最近才在版本0.12中实施。它在0.11中不可用。

请参阅:


建议的解决方法:

写一个UDF,其中包含一个月值,然后返回一个带有写出季节的chararray。然后,在FILTER之后的FOREACH语句中使用此UDF。

...
A = FILTER data BY (ArrDelay > 0);
A = FOREACH A GENERATE MySeasonUDF(Month) as Season, Dest, Year, ArrDelay;
X = GROUP A BY (Dest, Year, Season);
...