我要将Cloudera hadoop与Apache Nutch集成。不幸的是,当我尝试抓取网站时,会出现以下异常。我用纯Hbase和Solr配置Nutch没有任何问题,但似乎Cloudera在Hbase内部做了一些改变,Nutch的Gora模块无法理解。
./crawl urls/seed.txt testCrawl localhost:8983/solr/ 2
InjectorJob: starting at 2014-05-18 17:29:33
InjectorJob: Injecting urlDir: urls/seed.txt
InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.lang.NumberFormatException: For input string: "60000��u��Tm#PBUF
"
localhost�������("
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: For input string: "60000��u��Tm#PBUF
"
localhost�������("
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:127)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 7 more
Caused by: java.lang.NumberFormatException: For input string: "60000��u��Tm#PBUF
"
localhost�������("
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:63)
at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:354)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94)
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:109)
... 9 more
最好的问候。