将Solr与UIMA集成时的例外情况

时间:2017-03-19 06:11:07

标签: solr lucene uima alchemyapi opencalais

我正在尝试将UIMA与Solr集成。我正在遵循https://cwiki.apache.org/confluence/display/solr/UIMA+Integration中提到的步骤。但是当我尝试索引文档时,会在终端中抛出异常,并且还会记录solr日志中的错误跟踪。我一直试图解决一段时间,但无法为该问题找到合适的解决方案。 我已经包含了文件中提到的所有罐子。 我已经为API生成了有效的密钥。

分析字段:

<arr name="fields">
          <str>content</str>
        </arr>

内容字段属于字段类型text_general.It不是复制字段。该字段包含相应的文档内容。

<field name="content" type="text_general" indexed="true" termOffsets="true" stored="true" termPositions="true" termVectors="true" multiValued="true" required="true"/>

solrconfig.xml中:

 <updateRequestProcessorChain name="uima" >
            <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
            <lst name="uimaConfig">
            <lst name="runtimeParameters">
            <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
            <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
            <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
            <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
            <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
            <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
            </lst>
      <str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
      <!-- Set to true if you want to continue indexing even if text processing fails.
           Default is false. That is, Solr throws RuntimeException and
           never indexed documents entirely in your session. -->
      <bool name="ignoreErrors">true</bool>
        <str name="logField">fileName</str>
      <!-- This is optional. It is used for logging when text processing fails.
           If logField is not specified, uniqueKey will be used as logField.
      <str name="logField">id</str>
      -->
      <lst name="analyzeFields">
        <bool name="merge">false</bool>
        <arr name="fields">
          <str>content</str>
        </arr>
      </lst>
      <lst name="fieldMappings">
        <lst name="type">
          <str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
          <lst name="mapping">
            <str name="feature">text</str>
            <str name="field">concept</str>
          </lst>
        </lst>
        <lst name="type">
          <str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
          <lst name="mapping">
            <str name="feature">language</str>
            <str name="field">language</str>
          </lst>
        </lst>
        <lst name="type">
          <str name="name">org.apache.uima.SentenceAnnotation</str>
          <lst name="mapping">
            <str name="feature">coveredText</str>
            <str name="field">sentence</str>
          </lst>
        </lst>
      </lst>
    </lst>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

 <requestHandler name="/update" class="solr.UpdateRequestHandler">
  <lst name="defaults">
    <str name="update.chain">uima</str>
  </lst>
</requestHandler>

终端错误跟踪:

Mar 18, 2017 2:51:53 PM WhitespaceTokenizer typeSystemInit
INFO: "Whitespace tokenizer typesystem initialized"
Mar 18, 2017 2:51:53 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer starts processing"
Mar 18, 2017 2:51:53 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer finished processing"
Mar 18, 2017 2:51:53 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEn
gine_impl callAnalysisComponentProcess(405)
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException
        at org.apache.uima.annotator.calais.OpenCalaisAnnotator.process(OpenCala
isAnnotator.java:206)
        at org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasA
nnotator_ImplBase.java:56)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.cal
lAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.pro
cessAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterato
r.processUntilNextOutputCas(ASB_impl.java:567)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterato
r.<init>(ASB_impl.java:409)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.ja
va:342)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.pro
cessAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
        at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(A
nalysisEngineImplBase.java:267)
        at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(A
nalysisEngineImplBase.java:280)
.....

solr.log:

2017-03-19 05:41:24.466 WARN  (qtp1389647288-13) [   x:star] o.a.s.u.p.UIMAUpdateRequestProcessor skip the text processing due to null. id=3aedc166-c9ad-4b30-8bcb-d27177d2ae16,  text="nullget acquainted with ams application release readiness  confidential – not for distribution    1 ..."
2017-03-19 05:41:24.492 INFO  (qtp1389647288-13) [   x:star] o.a.s.u.p.LogUpdateProcessorFactory [star]  webapp=/solr path=/update params={wt=javabin&version=2}{add=[3aedc166-c9ad-4b30-8bcb-d27177d2ae16 (1562275568121020416)]} 0 12088
2017-03-19 05:41:39.493 INFO  (commitScheduler-10-thread-1) [   x:star] o.a.s.u.DirectUpdateHandler2 start 
...

我一直在努力解决这个问题。

谢谢和问候。

0 个答案:

没有答案