Question

“我想问一下，简化一个示例。我在模式中有两个字段。

<fieldType name="text_field" class="solr.TextField" positionIncrementGap="100" />
<field name="title" type="text_field" indexed="true" stored="true" multiValued="true"/>
<field name="content" type="text_field" indexed="true" stored="true" multiValued="true"/>

title和content字段返回2个语言值。 1.土耳其语，2。英语

title:[
    "Orta Doğu Teknik Üniversitesi",
    "Middle East Technical University"
]

content:[
    "Örnek içerik",
    "Example content"
]

问题

当我对其编制索引时，我希望将title和content字段拆分为title_tr，title_en和content_tr，content_en。

我知道我会使用LanguageDetection来检测和更新语言。但我不知道如何提出要求。

我使用的是Solr 4.9.0版本。

<field name="title_en" type="text_field" indexed="true" stored="true" multiValued="true" />
<field name="content_en" type="text_field" indexed="true" stored="true" multiValued="true" />
<field name="title_tr" type="text_field" indexed="true" stored="true" multiValued="true"/>
<field name="content_tr" type="text_field" indexed="true" stored="true" multiValued="true"/>

<copyField source="title" dest="title_tr"/>
<copyField source="content" dest="content_tr"/>
<copyField source="title" dest="title_en"/>
<copyField source="content" dest="content_en"/>

我想要结果：

title_tr:[
    "Orta Doğu Teknik Üniversitesi"
]

title_en:[
    "Middle East Technical University"
]

content_tr:[
    "Örnek içerik"
]

content_en:[
    "Example content"
]

我该怎么做？

Answer 1

如果你想解决它的问题，我建议你开发自己的UpdateRequestProcessor [1]。

你可以在那里实现你的逻辑：

1）扫描多场

中的每个值

2）识别语言（你可以看看这个UpdateRequestprocessor的方法[2]

3）创建一个字段text_，其值为索引

[1] https://lucene.apache.org/solr/guide/6_6/update-request-processors.html

[2] https://lucene.apache.org/solr/6_6_0//solr-langid/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessorFactory.html

Solr多值场检测语言

1 个答案: