Question

我正在使用Facet Terms来获取字段的所有唯一值及其计数。而且我得到了错误的结果。

term: web 
Count: 1191979 
term: misc 
Count: 1191979 
term: passwd 
Count: 1191979 
term: etc 
Count: 1191979

虽然实际结果应为：

term: WEB-MISC /etc/passwd 
Count: 1191979

以下是我的示例查询：

{
  "facets": {
    "terms1": {
      "terms": {
        "field": "message"
      }
    }
  }
}

Answer 1

如果重新索引是一个选项，那么最好更改映射并将此字段标记为not_analyzed

"your_field" : { "type": "string", "index" : "not_analyzed" }

如果需要保留分析的字段版本，则可以使用multi field type：

"your_field" : {
  "type" : "multi_field",
    "fields" : {
      "your_field" : {"type" : "string", "index" : "analyzed"},
      "untouched" : {"type" : "string", "index" : "not_analyzed"}
  }
}

这样，您可以在查询中继续使用your_field，同时使用your_field.untouched运行构面搜索。

或者，如果存储了此字段，则可以使用脚本字段facet：

"facets" : {
  "term" : {
    "terms" : {
      "script_field" : "_fields.your_field.value"
    }
  }
}

作为最后的手段，如果未存储此字段，但记录源存储在索引中，您可以尝试：

"facets" : {
  "term" : {
    "terms" : {
      "script_field" : "_source.your_field"
    }
  }
}

第一种解决方案效率最高。最后一个解决方案效率最低，可能会在大型索引上花费大量时间。

Answer 2

哇，我今天也遇到了同样的问题，而在最近的弹性搜索中聚集了一段时间。在谷歌搜索和一些部分理解后，发现这个令人讨厌的索引如何工作（这很简单）。

查询只能找到倒排索引中实际存在的术语

索引以下字符串

时

"WEB-MISC /etc/passwd"

它将被传递给分析仪。分析器可能会将其标记为

"WEB", "MISC", "etc" and "passwd"

其位置详情。此标记可能会过滤为小写，例如

"web", "misc", "etc" and "passwd"

因此，在建立索引后，搜索查询只能看到上面的4。不是完整的单词“WEB-MISC / etc / passwd”。根据您的要求，您可以使用以下选项

1.Change the Default Analyzer used by elasticsearch([link][1])
2.If it is not need, just TurnOff the analyzer by setting 'not_analyzed' for the fields you need
3.To convert the already indexed data searchable, re-indexing is the only option

Answer 3

我简要解释了这个问题并提出了两个解决方案here。我在这里谈到了多种方法。一种是使用not_analyzed来保存字符串。但是因为它具有不区分大小写的缺点，更好的方法是使用关键字标记符+小写过滤器

如何防止Facet术语进行标记化

3 个答案: