无法仅对Apache Pig中的某些关系执行UNION操作

时间:2017-11-26 21:39:07

标签: apache-pig union

我在Pig中遇到了UNION问题我无法解决。当我在同一个脚本中对某些关系执行UNION时,它可以正常工作,但在其他关系上它不起作用。我在onetwo两个关系上执行UNION。它不适用于这两种关系以及其他许多关系。这是我的剧本:

A = LOAD '/home/biadmin/datasets/Datasets/HomeA/2014/homeA2.csv' USING PigStorage(',') AS (Date:chararray ,use:float, gen:float, FurnaceHRV:float, CellarOutlets:float, WashingMachine:float, FridgeRange:float, DisposalDishwasher:float, KitchenLights:float, BedroomOutlets:float, BedroomLights:float, MasterOutlets:float, MasterLights:float, DuctHeaterHRV:float);
C = FOREACH A GENERATE Date,SUM(TOBAG(use..)) AS total:float, '1388552400' AS unixconst1:chararray, '86400' AS dayconst1:chararray;                                        
hourly = FILTER C BY ENDSWITH(Date, '00'); 
hourlymd = FOREACH hourly GENERATE *, SUBSTRING(Date,0,INDEXOF(Date,'/',0)) as month1:chararray, SUBSTRING(Date,INDEXOF(Date,'/',0)+1,LAST_INDEX_OF(Date,'/')) as day1:chararray, SUBSTRING(Date,LAST_INDEX_OF(Date,'/')+1,INDEXOF(Date,' ',0)) as year1:chararray, SUBSTRING(Date,INDEXOF(Date,' ',0)+1,INDEXOF(Date,':',0)) as hour1:chararray;                                                             
hourlymdB = FOREACH hourlymd GENERATE (int)(hour1) AS hour:int, (int)(day1) AS day:int, (int)(month1) AS month:int,  (int)(year1) AS year:int, (int)(unixconst1) AS unixconst:int, (int)(dayconst1) AS dayconst:int;
SPLIT hourlymdB INTO 
        one IF(month==1),
        two IF(month==2),
        three IF(month==3),
        four IF(month==4),
        five IF(month==5),
        six IF(month==6),
        seven IF(month==7),
        eight IF(month==8),
        nine IF(month==9),
        ten IF(month==10),
        eleven IF(month==11),
        twelve IF(month==12),
        rest OTHERWISE;

yearone = UNION one, two;

STORE yearone INTO '/home/biadmin/datasets/output/yearone';

此代码不起作用且没有错误。这是运行时的输出。

Pig Stack Trace
---------------
ERROR 0: java.lang.NullPointerException

org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.NullPointerException
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:283)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
    at org.apache.pig.PigServer.execute(PigServer.java:1341)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:392)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:232)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
    at org.apache.pig.Main.run(Main.java:608)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:619)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.NullPointerException
    at org.apache.hadoop.mapreduce.Job$4.run(Job.java:963)
    at org.apache.hadoop.mapreduce.Job$4.run(Job.java:961)
    at java.security.AccessController.doPrivileged(AccessController.java:366)
    at javax.security.auth.Subject.doAs(Subject.java:572)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
    at org.apache.hadoop.mapreduce.Job.getTaskReports(Job.java:961)
    at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getTaskReports(HadoopShims.java:218)
    at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addMapReduceStatistics(MRJobStats.java:353)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:233)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:364)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:277)
    ... 16 more
================================================================================

但是,如果在AC上执行UNION,请写:

yearone = UNION A, C;

然后它可以正常工作,没有问题,并将关系存储在目标中。我想不通为什么?我在grunt>(本地模式)shell和外部使用pix -x local 'file.pig'命令运行它。

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:1)

查看UNION A, Cyearone = onev UNION twov的语法。第二个UNION语法不正确。将其更改为

yearone =  UNION onev, twov;

注意:我的脚本中没有看到onev和twov关系。我假设你的脚本中有这些关系。