索引在自定义标记生成器上崩溃

时间:2014-07-25 02:30:03

标签: solr lucene analyzer

我们正在构建一个Solr插件来链接我们的专有引擎。预期用途是完全替换标准标记器。 (这是背景:Hybrid search and indexing: words and token metadata in Solr

尝试在Solr Admin中索引测试文档时:

id,title
12345,A test title

我想得到一个例外,我想,我的标记器正在开始。

配置更改(schema.xml)是:

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
<!--
     <analyzer type="query">
        <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer> 
     <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
-->
    </fieldType>
    <fieldType name="family_id_space_delimited_list" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
            <!--
            <filter class="com.linguasys.carabao.FamilyIDFilterFactory" />
            -->
          </analyzer>
    </fieldType>

    <fieldType name="role_space_delimited_list" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
            <!--
            <filter class="com.linguasys.carabao.RoleFilterFactory" />
            -->
          </analyzer>
    </fieldType>

Web服务本身有效。 (过滤器已被注释掉,因为它们因某种类型的不匹配错误而崩溃,但以后会出现这种情况。)

例外情况如下。它不只是&#34;我做错了什么&#34;,它&#34;我在哪里获得更多信息?&#34;

org.apache.solr.common.SolrException: Exception writing document id 12345 to the index; possible analysis error.
  at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
  at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
  at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:870)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1024)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:693)
  at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
  at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
  at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
  at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
  at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
  at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:655)
  at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:222)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1523)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: input AttributeSource must not be null
  at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:94)
  at org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:106)
  at org.apache.lucene.analysis.TokenFilter.<init>(TokenFilter.java:33)
  at org.apache.lucene.analysis.util.FilteringTokenFilter.<init>(FilteringTokenFilter.java:70)
  at org.apache.lucene.analysis.core.StopFilter.<init>(StopFilter.java:60)
  at org.apache.lucene.analysis.core.StopFilterFactory.create(StopFilterFactory.java:127)
  at org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67)
  at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:102)
  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:180)
  at org.apache.lucene.document.Field.tokenStream(Field.java:554)
  at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:597)
  at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
  at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
  at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:222)
  at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
  at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
  at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
  at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
  ... 35 more
",

1 个答案:

答案 0 :(得分:2)

您需要验证调用yourTokenizer.create(java.io.Reader reader)时会发生什么。从堆栈跟踪看起来,此方法返回null,并且此值一直传播到AttributeSource.<init>(AttributeSource.java:94)。此时返回null是非法的,因此是例外。

找出正在发生的事情的最佳方法是启用调试器并停在上面提到的行。