langid UpdateRequestProcessor仅映射第一个字段

时间:2013-09-24 19:46:33

标签: solr solr4 language-detection

我正在尝试使用solr的langid UpdateRequestProcessor。这是配置:

<updateRequestProcessorChain name="languages">
    <processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
        <lst name="invariants">
            <str name="langid.fl">focus, expertise, platforms, partners, participation, additional</str>
            <str name="langid.whitelist">en,fr</str>
            <str name="langid.fallback">en</str>
            <str name="langid.langField">detectedlang</str>
            <bool name="langid.map">true</bool>
            <bool name="langid.map.keepOrig">false</bool>
        </lst>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

我的字段看起来像这样:

<fields>
    <field name="_root_" type="string" indexed="true" stored="false"/>
    <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>

    <field name="id" type="string" indexed="true" stored="true" required="true" />

    <!-- raw fields from sql db -->
    <field name="expertise_id" type="int" indexed="true" stored="true" />
    <field name="person_id" type="int" indexed="true" stored="true" />
    <field name="mod_date" type="date" indexed="true" stored="true" />
    <field name="lang" type="string" indexed="true" stored="true" />
    <field name="focus" type="text_general" indexed="true" stored="true" />
    <field name="expertise" type="text_general" indexed="true" stored="true" />
    <field name="platforms" type="text_general" indexed="true" stored="true" />
    <field name="partners" type="text_general" indexed="true" stored="true" />
    <field name="participation" type="text_general" indexed="true" stored="true" />
    <field name="additional" type="text_general" indexed="true" stored="true" />
    <field name="tag" type="text_general" termVectors="true" multiValued="true" />      
    <field name="facet_tag" type="string" stored="false" indexed="false" docValues="true" multiValued="true" default=""/>

    <!-- language detected by solr -->
    <field name="detectedlang" type="string" indexed="true" stored="true" />

    <!-- defined locale fields -->
    <dynamicField name="*_en" type="text_en" indexed="true" stored="true" />
    <dynamicField name="*_fr" type="text_fr" indexed="true" stored="true" />

    <copyField source="tag" target="facet_tag"/>

</fields>

当我运行更新或数据导入时,我知道使用了“语言”更新链,因为focus被映射到focus_en并且设置了detectlang。但是, langid.fl中没有其他字段被映射。为什么?

示例更新查询:

{
  "additional": "here is some other information about me.",
  "expertise_id": "10000",
  "id": "foo_10000",
  "focus": "this is my new focus. It is very exciting. When I am done I expect to be super experienced."
}

以下是expertise_id=10000查询的结果。请注意,additional尚未移至additional_en

  "response":{"numFound":1,"start":0,"docs":[
      {
        "additional":"here is some other information about me.",
        "expertise_id":10000,
        "id":"foo_10000",
        "detectedlang":"en",
        "focus_en":"this is my new focus. It is very exciting. When I am done I expect to be super experienced.",
        "_version_":1447088846110982144}]
  }

1 个答案:

答案 0 :(得分:1)

原来问题是语法错误。这一行:

<str name="langid.fl">focus, expertise, platforms, partners, participation, additional</str>

必须是

<str name="langid.fl">focus,expertise,platforms,partners,participation,additional</str>

docs表示字段列表应为逗号或空格分隔值。很明显,逗号和空格搞砸了(虽然它在requestHandler中的fl之类的其他Solr上下文中工作正常,langid.fl被认为是建模的)。我也尝试了以空格分隔的语法,但它没有解决我的问题。

我希望这有助于某人。