我有xml文件,我可以拆分,以便不同的地图获取xml数据块(使用自定义记录阅读器)。
现在我想为bzip压缩的xml文件执行此操作。
无处不在的文档说bzip说它的可拆分性。
如果它是可拆分的,那么我之前的代码应该无需任何更改即可运行。但它没有用。
hadoop版本是hadoop-1.2.1
我不想要https://github.com/whym/wikihadoop。我想知道发生什么事情而不是复制代码。
错误信息:
13/10/10 06:52:49 ERROR security.UserGroupInformation: PriviledgedActionException as:admin cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost/xmlinputformat/_xmlinputformat_sample.xml.bz2
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost/xmlinputformat/_xmlinputformat_sample.xml.bz2
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at com.rhl.xmlinputformat.XmlInputFormat.listStatus(XmlInputFormat.java:42)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at com.rhl.xmlinputformat.XmlInputFormat.getSplits(XmlInputFormat.java:56)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at com.rhl.xmlinputformat.XmlInputFormatDriver.run(XmlInputFormatDriver.java:252)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.rhl.xmlinputformat.XmlInputFormatDriver.main(XmlInputFormatDriver.java:260)