我想在Elasticsearch中获得最常用的术语

时间:2018-12-18 00:16:15

标签: elasticsearch aggregation

我想在应用的小写字母中得到最常用的术语。

----以下是我尝试过的事情。 ----

创建索引

创建空白令牌标记器和小写的空白令牌标记器。

put stacktest
{
  "settings": {
    "analysis": {
      "analyzer": {
        "lower_whitespace": {
          "filter": ["lowercase"],
          "tokenizer": "whitespace"
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "txt": {
          "type": "text",
          "analyzer": "lower_whitespace",
          "fielddata": true,
          "fields": {
            "raw": {
              "type": "text",
              "analyzer": "whitespace",
              "fielddata": true
            }
          }
        }
      }
    }
  }
}

输入数据

插入4个数据。

post stacktest/doc
{
"txt": "aws dak"
}

post stacktest/doc
{
"txt": "aWs dAk"
}

post stacktest/doc
{
"txt": "aWs dAk"
}

post stacktest/doc
{
"txt": "AWS DAK"
}

获取aggs(小写字母-> rawterm) 它尝试获取聚合。

get stacktest/_search
{
  "size": 0,
  "aggs": {
    "countterms": {
      "terms": {
        "field": "txt",
        "size": 10
      },
      "aggs": {
        "countrawterms": {
          "terms": {
            "field": "txt.raw",
            "size": 1
          }
        }
      }
    }
  }
}

结果是

{
  .....
  "aggregations": {
    "countterms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "aws",
          "doc_count": 4,
          "countrawterms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 4,
            "buckets": [
              {
                "key": "aWs",
                "doc_count": 2
              }
            ]
          }
        },
        {
          "key": "dak",
          "doc_count": 4,
          "countrawterms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 4,
            "buckets": [
              {
                "key": "aWs",
                "doc_count": 2
              }
            ]
          }
        }
      ]
    }
  }
}

结果如下所示。如何获得想要的结果?

我想获得“ aw”和“ dAk”。

谢谢。

0 个答案:

没有答案