阅读Pig

时间:2015-11-25 13:25:16

标签: apache-pig glob orc

我正在尝试读取/加载目录中存在的多个Orc文件使用pig的OrcStorage()。我尝试使用glob技术,但这对我不起作用并抛出错误,说文件不存在,哪里可用。请让我知道如何在猪中实现此功能。

使用的示例文件:

hadoop fs -ls /sandbox/sandbox28/pig_demo/input/ORC/data_dt={2015111900,2015111901}
Found 2 items
-rw-r--r--   3 as303e hdfs     302986 2015-11-19 05:12 /sandbox/sandbox28/pig_demo/input/ORC/data_dt=2015111900/000000_0
-rw-r--r--   3 as303e hdfs     302986 2015-11-19 05:12 /sandbox/sandbox28/pig_demo/input/ORC/data_dt=2015111900/000001_0
Found 2 items
-rw-r--r--   3 as303e ksndbx28     302986 2015-11-25 04:34 /sandbox/sandbox28/pig_demo/input/ORC/data_dt=2015111901/000000_0
-rw-r--r--   3 as303e ksndbx28     302986 2015-11-25 04:34 /sandbox/sandbox28/pig_demo/input/ORC/data_dt=2015111901/000001_0

使用的代码:

A = load '/sandbox/sandbox28/pig_demo/input/ORC/data_dt={2015111900,2015111901}' Using OrcStorage();

B= limit A 2;

DUMP B;

错误日志:

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: B: Store(hdfs://localhost:8020/tmp/temp666047359/tmp808921130:org.apache.pig.impl.io.InterStorage) - scope-5 Operator Key: scope-5): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: B: Limit - scope-4 Operator Key: scope-4): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:159)
        at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:161)
        at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:278)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
        at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
        ... 15 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: B: Limit - scope-4 Operator Key: scope-4): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNextTuple(POLimit.java:122)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
        ... 22 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:131)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
        ... 24 more
Caused by: org.apache.hadoop.mapred.InvalidInputException: File does not exist: hdfs://localhost:8020/sandbox/sandbox28/pig_demo/input/ORC/data_dt={2015111900,2015111901}
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:961)
        at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121)
        at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
        at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:99)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:127)
        ... 25 more

0 个答案:

没有答案