我们正在构建一个Solr插件来链接我们的专有引擎。预期用途是完全替换标准标记器。 (这是背景:Hybrid search and indexing: words and token metadata in Solr)
尝试在Solr Admin中索引测试文档时:
id,title
12345,A test title
我想得到一个例外,我想,我的标记器正在开始。
配置更改(schema.xml)是:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<!--
<analyzer type="query">
<tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
-->
</fieldType>
<fieldType name="family_id_space_delimited_list" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
<!--
<filter class="com.linguasys.carabao.FamilyIDFilterFactory" />
-->
</analyzer>
</fieldType>
<fieldType name="role_space_delimited_list" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
<!--
<filter class="com.linguasys.carabao.RoleFilterFactory" />
-->
</analyzer>
</fieldType>
Web服务本身有效。 (过滤器已被注释掉,因为它们因某种类型的不匹配错误而崩溃,但以后会出现这种情况。)
例外情况如下。它不只是&#34;我做错了什么&#34;,它&#34;我在哪里获得更多信息?&#34;
org.apache.solr.common.SolrException: Exception writing document id 12345 to the index; possible analysis error.
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:870)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1024)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:693)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:655)
at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:222)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1523)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: input AttributeSource must not be null
at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:94)
at org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:106)
at org.apache.lucene.analysis.TokenFilter.<init>(TokenFilter.java:33)
at org.apache.lucene.analysis.util.FilteringTokenFilter.<init>(FilteringTokenFilter.java:70)
at org.apache.lucene.analysis.core.StopFilter.<init>(StopFilter.java:60)
at org.apache.lucene.analysis.core.StopFilterFactory.create(StopFilterFactory.java:127)
at org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67)
at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:102)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:180)
at org.apache.lucene.document.Field.tokenStream(Field.java:554)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:597)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:222)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
... 35 more
",
答案 0 :(得分:2)
您需要验证调用yourTokenizer.create(java.io.Reader reader)
时会发生什么。从堆栈跟踪看起来,此方法返回null
,并且此值一直传播到AttributeSource.<init>(AttributeSource.java:94)
。此时返回null
是非法的,因此是例外。
找出正在发生的事情的最佳方法是启用调试器并停在上面提到的行。