我正在尝试读取/加载目录中存在的多个Orc文件使用pig的OrcStorage()。我尝试使用glob技术,但这对我不起作用并抛出错误,说文件不存在,哪里可用。请让我知道如何在猪中实现此功能。
使用的示例文件:
hadoop fs -ls /sandbox/sandbox28/pig_demo/input/ORC/data_dt={2015111900,2015111901}
Found 2 items
-rw-r--r-- 3 as303e hdfs 302986 2015-11-19 05:12 /sandbox/sandbox28/pig_demo/input/ORC/data_dt=2015111900/000000_0
-rw-r--r-- 3 as303e hdfs 302986 2015-11-19 05:12 /sandbox/sandbox28/pig_demo/input/ORC/data_dt=2015111900/000001_0
Found 2 items
-rw-r--r-- 3 as303e ksndbx28 302986 2015-11-25 04:34 /sandbox/sandbox28/pig_demo/input/ORC/data_dt=2015111901/000000_0
-rw-r--r-- 3 as303e ksndbx28 302986 2015-11-25 04:34 /sandbox/sandbox28/pig_demo/input/ORC/data_dt=2015111901/000001_0
使用的代码:
A = load '/sandbox/sandbox28/pig_demo/input/ORC/data_dt={2015111900,2015111901}' Using OrcStorage();
B= limit A 2;
DUMP B;
错误日志:
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: B: Store(hdfs://localhost:8020/tmp/temp666047359/tmp808921130:org.apache.pig.impl.io.InterStorage) - scope-5 Operator Key: scope-5): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: B: Limit - scope-4 Operator Key: scope-4): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:159)
at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:161)
at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:278)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
... 15 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: B: Limit - scope-4 Operator Key: scope-4): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNextTuple(POLimit.java:122)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
... 22 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:131)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
... 24 more
Caused by: org.apache.hadoop.mapred.InvalidInputException: File does not exist: hdfs://localhost:8020/sandbox/sandbox28/pig_demo/input/ORC/data_dt={2015111900,2015111901}
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:961)
at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:99)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:127)
... 25 more