Galago 3.5索引

时间:2014-03-14 10:19:36

标签: search-engine information-retrieval lemur

下载了Galago 3.5 bin版本并尝试在此wiki-small.corpus后对guide进行索引。奇怪的是,在尝试运行build index命令时,我得到了.index文件的File Not Found Exception。当我明确使用inputPath和indexPath时,这个错误就消失了,但现在我得到了这个异常 -

  

创建的执行者:org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor@69107c05   没有服务器运行!   使用--server = true可启用基于Web的状态页面。   stage inputSplit以0错误完成。   2014年3月14日下午3:26:01 org.lemurproject.galago.core.parse.UniversalParser流程   信息:处理拆分:/Users/nanz/Downloads/wiki-small.corpus   java.lang.RuntimeException:java.lang.reflect.InvocationTargetException       在org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:137)       在org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:52)       在org.lemurproject.galago.core.types.DocumentSplit $ FileIdOrder $ TupleUnshredder.processTuple(DocumentSplit.java:2033)       在org.lemurproject.galago.core.types.DocumentSplit $ FileIdOrder $ DuplicateEliminator.processTuple(DocumentSplit.java:1989)       在org.lemurproject.galago.core.types.DocumentSplit $ FileIdOrder $ ShreddedBuffer.copyTuples(DocumentSplit.java:1705)       在org.lemurproject.galago.core.types.DocumentSplit $ FileIdOrder $ ShreddedBuffer.copyUntilFileId(DocumentSplit.java:1732)       在org.lemurproject.galago.core.types.DocumentSplit $ FileIdOrder $ ShreddedBuffer.copyUntil(DocumentSplit.java:1740)       在org.lemurproject.galago.core.types.DocumentSplit $ FileIdOrder $ ShreddedReader.run(DocumentSplit.java:1940)       在org.lemurproject.galago.tupleflow.FileOrderedReader.run(FileOrderedReader.java:76)       在org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor $ LocalExecutionStatus.run(LocalCheckpointedStageExecutor.java:96)       在java.lang.Thread.run(Thread.java:695)   引起:java.lang.reflect.InvocationTargetException       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)       在org.lemurproject.galago.core.parse.UniversalParser.constructParserWithSplit(UniversalParser.java:213)       在org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:132)       ......还有10个   引起:java.lang.NullPointerException       在org.lemurproject.galago.core.index.KeyValueReader.getManifest(KeyValueReader.java:35)       在org.lemurproject.galago.core.index.corpus.CorpusReader.init(CorpusReader.java:41)       在org.lemurproject.galago.core.index.corpus.CorpusReader。(CorpusReader.java:32)       在org.lemurproject.galago.core.parse.CorpusSplitParser。(CorpusSplitParser.java:33)       ......还有16个   Stage parsePostings完成,出现1个错误。   java.lang.Exception:java.lang.RuntimeException:java.lang.reflect.InvocationTargetException   线程" main"中的例外情况java.util.concurrent.ExecutionException:Stage抛出异常:       在org.lemurproject.galago.tupleflow.execution.JobExecutor $ JobExecutionStatus.waitForStages(JobExecutor.java:1062)       在org.lemurproject.galago.tupleflow.execution.JobExecutor $ JobExecutionStatus.run(JobExecutor.java:971)       在org.lemurproject.galago.tupleflow.execution.JobExecutor.runWithoutServer(JobExecutor.java:1122)       在org.lemurproject.galago.tupleflow.execution.JobExecutor.runLocally(JobExecutor.java:1177)       在org.lemurproject.galago.core.tools.AppFunction.runTupleFlowJob(AppFunction.java:101)       在org.lemurproject.galago.core.tools.apps.BuildIndex.run(BuildIndex.java:789)       在org.lemurproject.galago.core.tools.AppFunction.run(AppFunction.java:55)       在org.lemurproject.galago.core.tools.App.run(App.java:82)       在org.lemurproject.galago.core.tools.App.run(App.java:73)       在org.lemurproject.galago.core.tools.App.main(App.java:69)   引起:java.lang.Exception:java.lang.RuntimeException:java.lang.reflect.InvocationTargetException       在org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor $ LocalExecutionStatus.run(LocalCheckpointedStageExecutor.java:99)       在java.lang.Thread.run(Thread.java:695)

我尝试构建源代码,在这种情况下也得到了相同的结果。有人能指出我哪里错了吗?几乎没有人遇到过这个问题所以我通过简单的谷歌搜索得到的并不多。

2 个答案:

答案 0 :(得分:1)

解决。为了防止其他人面临这个问题,我的一位朋友发现Galago不会直接在wiki-small.corpus文件上工作,因为它试图寻找不存在的corpus.keys。只需将此.corpus文件替换为文档目录,一切都会正常工作。请显式指定indexPath和inputPath参数。使用“galago build help”查看确切的语法。欢呼声。

答案 1 :(得分:0)

我知道这已经晚了,但是教科书网站上的wiki-small.corpus文件是使用旧版本的galago构建的,即1.0系列,它保存在这个谷歌代码库中:{{ 3}}

Galago(2.0 ... 3.5 ...... 3.7)的新版本是源码表上Lemur项目下新开发的一部分,语料库格式已经发生变化。如果您有一个使用Galago 3.5构建的语料库文件,那么您的命令应该有效。