在Elasticsearch中获取唯一数据

时间:2017-07-17 08:15:06

标签: elasticsearch

我有以下数据

javac 1.8.0_131

我正在尝试获取基于ID字段的唯一数据,类似于我们在通过以下查询解雇组时在MySQL中获得的数据:

ID: 1, fldname: pawan
ID: 1, fldname: pawan1
ID: 1, fldname: pawan2
ID: 2, fldname: pawan3
ID: 3, fldname: pawan4
ID: 4, fldname: pawan5

这将返回唯一值。当我们按功能使用分组时,在狮身人面像搜索中也一样。

有没有办法在elasticsearch中获取唯一值..?

以下是我的示例映射:

select * from table_name where fldname like 'pawan%' group by ID

2 个答案:

答案 0 :(得分:0)

我建议你稍微修改你的映射:

{
  "record" : {
    "dynamic" : "false",
    "_all" : {
      "enabled" : false
    },
    "properties" : {
      "docid" : {
        "type" : "long"
      },
      "flgname" : {
        "type" : "text"
      }
    }
  }
}

以便docid是

然后你可以尝试模糊查询进行过滤,再加上聚合,就像这里一样,它可以检索docid的最小值,最大值,平均值和数量:

{
  "from" : 0,
  "size" : 10,
  "_source" : true,
  "query" : {
    "bool" : {
      "must" : [ {
        "match" : {
          "flgname" : {
            "query" : "pawan",
            "operator" : "OR",
            "fuzziness" : "1",
            "prefix_length" : 1,
            "max_expansions" : 50,
            "fuzzy_transpositions" : true,
            "lenient" : false,
            "zero_terms_query" : "NONE",
            "boost" : 1.0
          }
        }
      } ]
    }
  },
  "aggs" : {
    "my_cardinality" : {
      "cardinality" : {
        "field" : "docid"
      }
    },
    "my_avg" : {
      "avg" : {
        "field" : "docid"
      }
    },
    "my_min" : {
      "min" : {
        "field" : "docid"
      }
    },
    "my_max" : {
      "max" : {
        "field" : "docid"
      }
    }
  }
}

顺便说一下,这是对您提议的数据进行上述查询的结果:

{
  "took" : 47,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : 0.9808292,
    "hits" : [ {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "40b5eac0-743b-4a6a-a06d-3ae4d56f4aca",
      "_score" : 0.9808292,
      "_source" : {
        "docid" : "1",
        "flgname" : "pawan"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "27821c39-e722-4361-bc07-0dcd5181a1ad",
      "_score" : 0.7846634,
      "_source" : {
        "docid" : "2",
        "flgname" : "pawan3"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "86fcd9c1-a688-4a6a-9c45-e91791a8b902",
      "_score" : 0.7846634,
      "_source" : {
        "docid" : "4",
        "flgname" : "pawan5"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "fb00a3cc-f1b8-4073-8808-f2ddbc4979e2",
      "_score" : 0.55451775,
      "_source" : {
        "docid" : "1",
        "flgname" : "pawan1"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "18e5e20d-17a7-4d59-b2f1-7bf325a4c4df",
      "_score" : 0.55451775,
      "_source" : {
        "docid" : "3",
        "flgname" : "pawan4"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "fbf49af6-f574-4ad2-8686-cbbedc5e70c4",
      "_score" : 0.23014566,
      "_source" : {
        "docid" : "1",
        "flgname" : "pawan2"
      }
    } ]
  },
  "aggregations" : {
    "my_cardinality" : {
      "value" : 4
    },
    "my_max" : {
      "value" : 4.0
    },
    "my_avg" : {
      "value" : 2.0
    },
    "my_min" : {
      "value" : 1.0
    }
  }
}

答案 1 :(得分:0)

如果你让flgname也是一个关键字,那么你可以使用子聚合来聚合over docID和subggregate over flgname。结果将类似于您提到的SQL查询。

查询看起来像:

{   "size": 0,
"query": {
    "regexp":{
        "flgname": "pawa.*"
    }
},
"aggs" : {
    "docids": {
       "terms": {"field": "docid"},
       "aggs": { "flgnam": { "terms": {"field": "flgname"}}}}
}

}