ElasticSearch:通过词干突出显示

时间:2019-04-20 05:56:30

标签: elasticsearch

我已阅读question,并试图理解文档here,但这很复杂。

问题(我认为):

[更新1]

我正在使用Scala编写代码并与ES高级Java API进行接口。

我配置了词干分析器。如果我搜索responsibilities,则会得到responsibilitiesresponsibility的结果。太好了。

但是

只有带有术语responsibilities的文档才突出显示。 这是因为搜索是针对词干内容,即responsib。但是,重点在于未阻止的内容。因此,它找到responsibilities作为搜索标准,而不是responsibility,不是搜索条件。

如果我将荧光笔设置为在突出显示的内容上突出显示,则它什么也不会返回。我猜是因为它正在将resonsibresponsibilities

搜索

我正在使用Java高级API。问题不在于代码本身。 当前,我仅突出显示content字段,仅返回responsibilities。突出显示content.english似乎什么也没返回

 private def buildHighlighter(): HighlightBuilder = {
    import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder
    val highlightBuilder = new HighlightBuilder
    val highlightContent = new HighlightBuilder.Field("content")
    highlightContent.highlighterType("unified")
    highlightBuilder.field(highlightContent)
    highlightBuilder

  }

映射(已汇总)

{
	"settings": {
		"number_of_shards": 3,
		"analysis": {
			"filter": {
				"english_stop": {
					"type": "stop",
					"stopwords": "_english_"
				},
				"english_keywords": {
					"type": "keyword_marker",
					"keywords": []
				},
				"english_stemmer": {
					"type": "stemmer",
					"language": "english"
				},
				"english_possessive_stemmer": {
					"type": "stemmer",
					"language": "possessive_english"
				}
			},
			"analyzer": {
				"english": {
					"tokenizer": "standard",
					"filter": [
						"english_possessive_stemmer",
						"lowercase",
						"english_stop",
						"english_keywords",
						"english_stemmer"
					]
				}
			}
		}
	},
	"mappings": {
		"_doc": {
			"properties": {
				"title": {
					"type": "text",
          "fields": {
           "english": {
             "type":     "text",
              "analyzer": "english"
            }
          }
				},
				"content": {
          "type": "text",
           "fields": {
            "english": {
              "type":     "text",
               "analyzer": "english"
             }
          }
			
			}
		}
	}
}

[更新2]

实现搜索的标量代码:

def searchByField(indices: Seq[ESIndexName], terms: Seq[(String, String)], size: Int = 20): SearchResponse = {

    val searchRequest = new SearchRequest
    searchRequest.indices(indices.map(idx => idx.completeIndexName()): _*)
    searchRequest.source(buildTargetFieldsMatchQuery(terms, size))

    searchRequest.indicesOptions(IndicesOptions.strictSingleIndexNoExpandForbidClosed())

    client.search(searchRequest, RequestOptions.DEFAULT)
  }

,查询的构建方式如下:

private def buildTargetFieldsMatchQuery(termsByField: Seq[(String, String)], size: Int): SearchSourceBuilder = {

    val query = new BoolQueryBuilder

    termsByField.foreach {
      case (field, term) =>

        if (field == "content") {
          logger.debug(field + " should have " + term)
          query.should(new MatchQueryBuilder(field+standardAnalyzer, term.toLowerCase))
          query.should(new MatchQueryBuilder(field, term.toLowerCase))
        }
        else if (field == "title"){
          logger.debug(field + " should have " + term)
          query.should(new MatchQueryBuilder(field+standardAnalyzer, term.toLowerCase())).boost
        }
        else {
          logger.debug(field + " should have " + term)
        query.should(new MatchQueryBuilder(field, term.toLowerCase))
      }

    }
    val sourceBuilder: SearchSourceBuilder = new SearchSourceBuilder()
    sourceBuilder.query(query)
    sourceBuilder.from(0)
    sourceBuilder.size(size)
    sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS))
    sourceBuilder.highlighter(buildHighlighter())

  }

1 个答案:

答案 0 :(得分:0)

使用普通的REST,以下对我来说很好用:

PUT test
{
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_keywords": {
          "type": "keyword_marker",
          "keywords": []
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        }
      },
      "analyzer": {
        "english": {
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_keywords",
            "english_stemmer"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text",
          "fields": {
            "english": {
              "type": "text",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}

POST test/_doc/
{
  "content": "This is my responsibility"
}

POST test/_doc/
{
  "content": "These are my responsibilities"
}

GET test/_search
{
  "query": {
    "match": {
      "content.english": "responsibilities"
    }
  },
  "highlight": {
    "fields": {
      "content.english": {
        "type": "unified"
      }
    }
  }
}

结果如下:

"hits" : [
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "5D5PPGoBqgTTLzdtM-_Y",
    "_score" : 0.18232156,
    "_source" : {
      "content" : "This is my responsibility"
    },
    "highlight" : {
      "content.english" : [
        "This is my <em>responsibility</em>"
      ]
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "5T5PPGoBqgTTLzdtZe8U",
    "_score" : 0.18232156,
    "_source" : {
      "content" : "These are my responsibilities"
    },
    "highlight" : {
      "content.english" : [
        "These are my <em>responsibilities</em>"
      ]
    }
  }
]

查看您的Java / Groovy(?)代码,它看起来与example in the docs足够近了。您能否记录正在运行的实际查询,以便我们找出问题所在?通常,它应该像这样工作。