Question

我在Logstash / ElasticSearch中为字段使用路径层次结构标记器。因此，如果路径字段类似于/ a / b / c，则tokenizer将其转换为

    /a
    /a/b
    /a/b/c

我想生成像

这样的统计数据

    a - 3 hits
    b - 2 hits
    c - 1 hit

最好的方法是什么？另外，我想知道是否有办法在单独的字段中添加文件夹深度。

Answer 1

为了您的自定义目的，我认为您可以在字段中指定自定义模式分析器，并使用术语聚合字段。示例如下：

定义自定义分析器：

PUT /test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "nonword": {
          "type": "pattern",
          "pattern": "/"
        }
      }
    }
  }
}

创建映射：

 POST /test_index/_mapping/test_1
{
  "properties": {
    "dir": {
      "type": "string",
      "index": "analyzed",
      "analyzer": "nonword",
      "fields": {
        "un_touched": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

注意：＆＃39; un_touched＆＃39;保留字段以保存数据的原始版本。

填充数据并执行聚合：

GET /test_index/test_1/_search
{
  "aggs": {
    "my_agg": {
      "terms": {
        "field": "dir",
        "size": 0
      }
    }
  }
}

注意：这只是一个很小的例子，您应该非常关心模式;

路径层次结构ElasticSearch和文件夹深度？

1 个答案: