Hadoop:java.lang.Exception:java.lang.NoClassDefFoundError:org / apache / xerces / parsers / AbstractSAXParser

时间:2014-04-03 17:56:44

标签: java hadoop mapreduce

我之前发布的问题是:

Hadoop: java.lang.Exception: java.lang.RuntimeException: Error in configuring object

然后我按照建议将所有jar文件打包成一个,第一个问题就解决了。 请参考上一篇文章获取源代码。提前致谢。 但是新问题出现了:

14/04/03 13:47:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/04/03 13:47:40 WARN snappy.LoadSnappy: Snappy native library is available
14/04/03 13:47:40 INFO snappy.LoadSnappy: Snappy native library loaded
14/04/03 13:47:40 INFO mapred.FileInputFormat: Total input paths to process : 1
14/04/03 13:47:40 INFO mapred.JobClient: Running job: job_local1748858601_0001
14/04/03 13:47:40 INFO mapred.LocalJobRunner: Waiting for map tasks
14/04/03 13:47:40 INFO mapred.LocalJobRunner: Starting task: attempt_local1748858601_0001_m_000000_0
14/04/03 13:47:40 INFO util.ProcessTree: setsid exited with exit code 0
14/04/03 13:47:40 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@c943d1
14/04/03 13:47:40 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/project/input1/url.txt:0+68
14/04/03 13:47:40 INFO mapred.MapTask: numReduceTasks: 1
14/04/03 13:47:40 INFO mapred.MapTask: io.sort.mb = 100
14/04/03 13:47:40 INFO mapred.MapTask: data buffer = 79691776/99614720
14/04/03 13:47:40 INFO mapred.MapTask: record buffer = 262144/327680
Prepare to get into webpage
14/04/03 13:47:41 INFO mapred.JobClient:  map 0% reduce 0%
14/04/03 13:47:43 INFO mapred.LocalJobRunner: Map task executor complete.
14/04/03 13:47:43 WARN mapred.LocalJobRunner: job_local1748858601_0001
java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/xerces/parsers/AbstractSAXParser
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NoClassDefFoundError: org/apache/xerces/parsers/AbstractSAXParser
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    at de.l3s.boilerpipe.sax.BoilerpipeSAXInput.getTextDocument(BoilerpipeSAXInput.java:51)
    at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:69)
    at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:87)
    at webPageToTxt.WebPageToTxt.webPageString(WebPageToTxt.java:82)
    at webPageToTxt.WebPageToTxt.multiWebPageString(WebPageToTxt.java:126)
    at webPageToTxt.WebPageToTxt.webPageToTxt(WebPageToTxt.java:40)
    at webPageToTxt.WebPageToTxtMapper.map(WebPageToTxtMapper.java:27)
    at webPageToTxt.WebPageToTxtMapper.map(WebPageToTxtMapper.java:1)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.ClassNotFoundException: org.apache.xerces.parsers.AbstractSAXParser
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    ... 29 more
14/04/03 13:47:44 INFO mapred.JobClient: Job complete: job_local1748858601_0001
14/04/03 13:47:44 INFO mapred.JobClient: Counters: 0
14/04/03 13:47:44 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at webPageToTxt.ConfMain.run(ConfMain.java:33)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at webPageToTxt.ConfMain.main(ConfMain.java:40)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

1 个答案:

答案 0 :(得分:0)

你需要将你正在使用的所有罐子添加到你的驱动器和罐子里面的罐子里面。 map reduce代码驻留,以便它们在运行时可供映射器使用。

我浏览了您提供的链接。虽然将其他类打包为Map Reduce jar的一部分也可以。这并不总是可行的。如你所见,你在这里使用xerces,你需要包含xerces-impl.jar。

更好的方法是将这些jar添加到DistributedCache。

DistributedCache.addArchiveToClassPath(new Path("HDFS Path"), job);

您可以将jar保留在HDFS中。因此解决方案是添加xerces jar。