Apache Pig- ERROR 6007:“无法检查名称”消息

时间:2013-10-12 05:23:41

标签: java hbase apache-pig

环境:hadoop 1.0.3,hbase 0.94.1,猪0.11.1

我正在Java程序中运行一个猪脚本,我有时会收到以下错误但不是所有时间。该程序的作用是从hdfs加载文件,进行一些转换并将其存储到hbase中。我的程序是多线程的。我已经使PigServer线程安全,我在hdfs中创建了“/ user / root”目录。这是程序的片段和我得到的例外。请指教。

pigServer = PigFactory.getServer();
URL path = getClass().getClassLoader().getResource("cfg/concatall.py");  
LOG.info("CDNResolve2Hbase: reading concatall.py file from " + path.toString());
pigServer.getPigContext().getProperties().setProperty(PigContext.JOB_NAME,
"CDNResolve2Hbase");
pigServer.registerQuery("A = load '" + inputPath + "' using PigStorage('\t') as     (ip:chararray, do:chararray, cn:chararray, cdn:chararray, firsttime:chararray,     updatetime:chararray);");
pigServer.registerCode(path.toString(),"jython","myfunc");
pigServer.registerQuery("B = foreach A generate myfunc.concatall('"+ extractTimestamp (inputPath)+"',ip,do,cn), cdn, SUBSTRING(firsttime,0,8);");
outputTable = "hbase://" + outputTable;
ExecJob job = pigServer.store  ("B",outputTable,"org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:cdn d:dtime')");

我的PigFactory有以下代码

private static ThreadLocal<PigServer> pigServer = new ThreadLocal<PigServer>();
public static synchronized PigServer getServer() {
if (pigServer.get() == null) {
try
{ printClassPath(); Properties prop = SystemUtils.getCfg(); pigServer.set(new PigServer    (ExecType.MAPREDUCE, prop)); return pigServer.get(); }
catch (Exception e)
{ LOG.error("error in starting PigServer:", e); return null; }
}
return pigServer.get();
}
  

org.apache.pig.impl.logicalLayer.FrontendException:错误1000:解析期间出错。   无法检查名称hdfs:// DC-001:9000 / user / root   在org.apache.pig.PigServer $ Graph.parseQuery(PigServer.java:1607)   在org.apache.pig.PigServer $ Graph.registerQuery(PigServer.java:1546)   在org.apache.pig.PigServer.registerQuery(PigServer.java:516)   在org.apache.pig.PigServer.registerQuery(PigServer.java:529)   在com.hugedata.cdnserver.datanalysis.CDNResolve2Hbase.execute(未知来源)   在com.hugedata.cdnserver.DatAnalysis.cdnResolve2Hbase(未知来源)   在com.hugedata.cdnserver.task.HandleDomainNameLogTask.execute(未知来源)   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)   在java.lang.reflect.Method.invoke(Method.java:597)   在org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:273)   在org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean $ MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:264)   在org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)   在org.quartz.core.JobRunShell.run(JobRunShell.java:203)   在org.quartz.simpl.SimpleThreadPool $ WorkerThread.run(SimpleThreadPool.java:520)

     

引起:解析失败:Pig脚本无法解析:    pig脚本无法验证:org.apache.pig.backend.datastorage.DataStorageException:ERROR 6007:无法检查名称hdfs:// DC-001:9000 / user / root   在org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)   在org.apache.pig.PigServer $ Graph.parseQuery(PigServer.java:1599)

     

......还有15个   引起:    pig脚本无法验证:org.apache.pig.backend.datastorage.DataStorageException:ERROR 6007:无法检查名称hdfs:// DC-001:9000 / user / root   在org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)   at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236)   在org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)   at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)   在org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)   在org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)   在org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)

     

......还有16个   引起:org.apache.pig.backend.datastorage.DataStorageException:错误6007:无法检查名称hdfs:// DC-001:9000 / user / root   在org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:207)   在org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:128)   在org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:138)   在org.apache.pig.parser.QueryParserUtils.getCurrentDir(QueryParserUtils.java:91)   在org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:827)

     

......还有22个   引起:java.io.IOException:文件系统已关闭   在org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)   在org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:873)   在org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)   在org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)   在org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:200)

     
    

... 26更多

  

2 个答案:

答案 0 :(得分:0)

这看起来像HDFSClient问题。我理解的问题是FileSystem对象被缓存。所以我的猜测是一个Thread正在关闭它而另一个Thread仍在使用它。 Caused by: java.io.IOException: Filesystem closed at

看一下关于多个FileSystem实例的其他SO帖子。 Multiples Hadoop FileSystem instances

答案 1 :(得分:0)

您获得的错误表明您没有使用与您的服务器相同的hadoop客户端。 你能检查一下本地安装的hadoop版本吗?