我正在为一个提供图书全文搜索的应用程序开发自动完成功能。
我正在尝试使用上下文过滤(例如,将结果限制为一本书的页面中的文本)配置Solr(v.7.4.0)提示程序,以返回所提供查询的匹配字词,但它返回的内容整个字段。
在solrconfig.xml中searchSearch的定义中,当我使用FuzzyLookupFactory时,此方法工作正常(返回单个单词),但该查找实现不支持上下文过滤。当我结合DocumentDictionaryFactory切换到AnalysisInfixLookupFactory来支持上下文过滤时(请参见Solr docs),我可以返回整个字段。
示例字段值:
{
"id":"abc1234",
"ocrtext":"In choosing Colors for candy, certain qualifications are necessary. First, they must not fade or change"
}
响应如下查询:
http://127.0.0.1:8983/solr/iiif_suggest?wt=json&q=col&suggest.cfq=456789
我想要的是:
{
"responseHeader": {
"status": 0,
"QTime": 1
},
"suggest": {
"iiifSuggester": {
"col": {
"numFound": 1,
"suggestions": [
{
"term": "colors",
"weight": 0,
"payload": ""
}]}}}
}
但是我得到的是:
{
"responseHeader": {
"status": 0,
"QTime": 1
},
"suggest": {
"iiifSuggester": {
"col": {
"numFound": 1,
"suggestions": [
{
"term": "In choosing Colors for candy, certain qualifications are necessary. First, they must not fade or change",
"weight": 0,
"payload": ""
}]}}}
}
以下是相关的solrconfig.xml设置:
<searchComponent name="iiif_suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="suggestAnalyzerFieldType">ocrtext_suggest</str>
<str name="contextField">is_page_of_ssim</str>
<str name="field">ocrtext</str>
</lst>
</searchComponent>
这是schema.xml中的字段定义:
<fieldType name="ocrtext_suggest" class="solr.TextField" positionIncrementGap="100">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<field name="ocrtext" type="ocrtext_suggest" indexed="true" stored="true" multiValued="false" />
基本上,ocrtext_suggest
是根据默认Solr textSpell
字段类型定义建模的。但是,我发现该字段必须具有stored="true"
才能返回任何结果。
当我在Solr GUI模式浏览器中查看ocrtext字段的内容并单击Load Term Info时,该字段似乎被标记为单个术语。我不明白DocumentDictionaryFactory如何存储完整的字段值。
任何建议将不胜感激!
答案 0 :(得分:0)
尝试根据您的要求替换此<str name="lookupImpl">FreeTextLookupFactory</str>
。