Nutch 2.3.1 on Yarn 2.7.1错误

时间:2016-05-11 08:59:18

标签: hadoop mapreduce yarn nutch

有没有人设法在Hadoop 2群集上运行Nutch 2.3.1?我一直试图在我的Hadoop / Yarn 2.7.1集群上运行Nutch 2.3.1大约两天了。

首先,我的Nutch只在本地安装,而不是在所有节点上安装。我将HBase设置为存储引擎。

最初,在集群上下载并尝试它是失败的,因为它无法在工作端找到一些库,我通过修改runtime/local/bin/nutch脚本来解决,以便在发送要执行的jar时包含所有库:

LIBJARS="$NUTCH_HOME"/lib/apache-nutch-2.3.1.jar
for f in "$NUTCH_HOME"/lib/*.jar; do
   LIBJARS="${LIBJARS},$f";
done

# run it
exec "${EXEC_CALL[@]}" $CLASS -libjars $LIBJARS "$@"

然而,在解决此问题后,我遇到以下错误,我不知道如何解决:

InjectorJob: starting at 2016-05-11 10:37:46
InjectorJob: Injecting urlDir: /user/ubuntu/urls
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
Error: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.net.MalformedURLException
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:118)
    at org.apache.gora.mapreduce.GoraOutputFormat.getRecordWriter(GoraOutputFormat.java:88)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.net.MalformedURLException
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:132)
    at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
    ... 10 more
Caused by: java.net.MalformedURLException
    at java.net.URL.<init>(URL.java:630)
    at java.net.URL.<init>(URL.java:493)
    at java.net.URL.<init>(URL.java:442)
    at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:865)
    at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:719)
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:116)
    ... 12 more
Caused by: java.lang.NullPointerException
    at java.net.URL.<init>(URL.java:535)
    ... 25 more

Error: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.net.MalformedURLException
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:118)
    at org.apache.gora.mapreduce.GoraOutputFormat.getRecordWriter(GoraOutputFormat.java:88)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.net.MalformedURLException
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:132)
    at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
    ... 10 more
Caused by: java.net.MalformedURLException
    at java.net.URL.<init>(URL.java:630)
    at java.net.URL.<init>(URL.java:493)
    at java.net.URL.<init>(URL.java:442)
    at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:865)
    at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:719)
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:116)
    ... 12 more
Caused by: java.lang.NullPointerException
    at java.net.URL.<init>(URL.java:535)
    ... 25 more

Error: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.net.MalformedURLException
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:118)
    at org.apache.gora.mapreduce.GoraOutputFormat.getRecordWriter(GoraOutputFormat.java:88)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.net.MalformedURLException
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:132)
    at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
    ... 10 more
Caused by: java.net.MalformedURLException
    at java.net.URL.<init>(URL.java:630)
    at java.net.URL.<init>(URL.java:493)
    at java.net.URL.<init>(URL.java:442)
    at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:865)
    at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:719)
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:116)
    ... 12 more
Caused by: java.lang.NullPointerException
    at java.net.URL.<init>(URL.java:535)
    ... 25 more

InjectorJob: java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_1462952885071_0009
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
    at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)

1 个答案:

答案 0 :(得分:0)

知道了。首先,我试图执行runtime/local/bin scripts,这对集群无效。在这种情况下运行的正确脚本是runtime/deploy/bin中的脚本。据荷兰维基说。

您在$ NUTCH_HOME / runtime / deploy中找到的Nutch作业jar是自包含的,并附带Nutch所需的所有配置文件,以便能够在任何vanilla Hadoop集群上运行。您所需要的只是一个健康的集群和指向jobtracker的Hadoop环境(集群或本地)。

此外,非常重要的是,为了正确构建分布式模式的nutch,nutch-site.xml配置不应包含 plugin.folders的设置。我的包含

<property>
 <name>http.agent.name</name>
 <value>Sofia's Nutch Spider</value>
</property>

<property>
 <name>storage.data.store.class</name>
 <value>org.apache.gora.hbase.store.HBaseStore</value>
</property>

<!-- If this is not set to -1, then big pages might not be scanned till the end -->
<property>
 <name>http.content.limit</name>
 <value>-1</value>
</property>
<property>
  <name>file.crawl.parent</name>
  <value>false</value>
</property>

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://hadoop-master:8020</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>hadoop-master</value>
</property>
 <property>
  <name>yarn.resourcemanager.address</name>
  <value>hadoop-master:8032</value>
 </property>
 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>
 <property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value> org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
 <property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>hadoop-master:8031</value>
 </property>
 <property>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>hadoop-master:8030</value>
 </property>