如何使用Elasticsearch获取每个文档的平均丢失字段数?

时间:2017-10-18 10:32:46

标签: elasticsearch elasticsearch-aggregation

很快:使用Elasticsearch,给定一个字段列表,如何将每个文档的平均丢失字段数作为聚合?

详细

使用missing聚合类型,我可以获得缺少给定字段的文档总数。所以使用以下数据:

"hits": [{
    "name": "A name",
    "nickname": "A nickname",
    "bestfriend": "A friend",
    "hobby": "An hobby"
},{
    "name": "A name",
    "hobby": "An hobby"
},{
    "name": "A name",
    "nickname": "A nickname",
    "hobby": "An hobby"
},{
    "name": "A name",
    "bestfriend": "A friend"
}]

我可以运行以下查询:

{
    "aggs": {
        "name_missing": {
            "missing": {"field": "name"}
        },
        "nickname_missing": {
            "missing": {"field": "nickname"}
        },
        "hobby_missing": {
            "missing": {"field": "hobby"}
        },
        "bestfriend_missing": {
            "missing": {"field": "bestfriend"}
        }
    }
}

我得到以下聚合:

...
"aggregations": {
    "name_missing": {
        "doc_count": 0
    },
    "nickname_missing": {
        "doc_count": 2
    },
    "hobby_missing": {
        "doc_count": 1
    },
    "bestfriend_missing": {
        "doc_count": 1
    }   
}
...

我现在需要的是获得每个文档的平均丢失字段数。我可以通过代码对结果进行数学计算:

  • 汇总所有missing汇总doc_count
  • 除以总点击次数

但是如何从Elasticsearch获得与聚合相同的结果?

感谢您的回复/建议。

1 个答案:

答案 0 :(得分:1)

这是一个丑陋的解决方案,但它可以解决问题。

GET missing/missing/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "script": "'aaa'"
      },
      "aggs": {
        "name_missing": {
          "missing": {
            "field": "name"
          }
        },
        "nickname_missing": {
          "missing": {
            "field": "nickname"
          }
        },
        "hobby_missing": {
          "missing": {
            "field": "hobby"
          }
        },
        "bestfriend_missing": {
          "missing": {
            "field": "bestfriend"
          }
        },
        "avg_missing": {
          "bucket_script": {
            "buckets_path": {            // This is kind of defining variables. name_missing._count will take the doc_count of the name_missing aggregation and same for others(nickname_missing,hobby_missing,bestfriend_missing) as well. "count":"_count" will take doc_count of the documents on which aggregation is performed(total no. of Hits).
              "name_missing": "name_missing._count",
              "nickname_missing": "nickname_missing._count",
              "hobby_missing": "hobby_missing._count",
              "bestfriend_missing": "bestfriend_missing._count",
              "count":"_count"
            },
            "script": "(name_missing+nickname_missing+hobby_missing+bestfriend_missing)/count" // Here we are adding all the missing values and dividing it by the total no. of Hits as you require.
          }
        }
      }
    }
  }
}

我已经向您展示了如何操作,现在您可以按照自己的方式按摩参数并提取您想要的内容。