在ElasticSearch中查询和过滤唯一文档

时间:2016-04-23 14:01:38

标签: search elasticsearch

我已将一堆我们的宠物食品数据添加到ElasticSearch索引中,数据看起来像这样:

{
  "title": "Pedigree Absolute Max, 700g Adult",
  "price": 7828,
  "supplier": "Madison Distributions",
  "supplierid": 241,
  "lastupdated": "2016-04-23"
},    
{
  "title": "Pedigree Smart Choice 1kg Adult",
  "price": 3428,
  "supplier": "Madison Distributions",
  "supplierid": 241,
  "lastupdated": "2016-04-23"
}
{
  "title": "Canagan Adult 1kg Refresh",
  "price": 3528,
  "supplier": "Madison Distributions",
  "supplierid": 241,
  "lastupdated": "2016-04-23"
},    
{
  "title": "Skinners 15Kg Puppy Kibble",
  "price": 9228,
  "supplier": "Madison Distributions",
  "supplierid": 241,
  "lastupdated": "2016-04-23"
},   
{
  "title": "Pedigree Absolute Max, 700 grams Adult Size",
  "price": 7628,
  "supplier": "Bay Pet",
  "supplierid": 313,
  "lastupdated": "2016-04-23"
},  
{
  "title": "Skinners 25Kg Puppy Kibble",
  "price": 10228,
  "supplier": "Bay Pet",
  "supplierid": 313,
  "lastupdated": "2016-04-23"
},
{
  "title": "Pedigree Absolute Max, 700g Adult",
  "price": 7428,
  "supplier": "Madison Distributions",
  "supplierid": 241,
  "lastupdated": "2016-04-22"
},    
{
  "title": "Pedigree Absolute Max, 700 grams Adult Size",
  "price": 7528,
  "supplier": "Bay Pet",
  "supplierid": 313,
  "lastupdated": "2016-04-22"
},
{
  "title": "Skinners 25Kg Puppy Kibble",
  "price": 107228,
  "supplier": "Bay Pet",
  "supplierid": 313,
  "lastupdated": "2016-04-21"
},  
{
  "title": "Pedigree Absolute Max, 700g Adult",
  "price": 7228,
  "supplier": "Madison Distributions",
  "supplierid": 241,
  "lastupdated": "2016-04-21"
},    
{
  "title": "Pedigree Absolute Max, 700 grams Adult Size",
  "price": 7328,
  "supplier": "Bay Pet",
  "supplierid": 313,
  "lastupdated": "2016-04-21"
}

我们每天都会为这些供应商编制索引并获取最新价格(以上是一个非常小的样本!)。我正在尝试查询每个供应商的最新价格。

我有这个似乎现在正常工作:

GET products/pets/_search
{
  "size": 0,
  "query": {
    "filtered": { 
      "query": {
        "match": { "title": "Pedigree" }
      }
    }
  },
  "aggs": {
    "souces": {
      "terms": {
        "field": "supplierid"
      },
      "aggs": {
        "latest": {
          "top_hits": {
            "size": 1,
            "_source": 
            [
              "title",
              "supplier",
              "lastupdated",
              "price"
            ],
            "sort": {
              "lastupdated": "desc"
            }
          }
        }
      }
    }
  }
}

这是我选择的映射:

{
  "products": {
    "mappings": {
      "pets": {
        "properties": {
          "lastupdated": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "price": {
            "type": "long"
          },
          "query": {
            "properties": {
              "match": {
                "type": "string"
              }
            }
          },
          "supplier": {
            "type": "string"
          },
          "supplierid": {
            "type": "long"
          },
          "title": {
            "type": "string"
          }
        }
      }
    }
  }
}

我这样做了吗?我有什么东西可以俯瞰吗?这些指数将逐月分解。我们每天从供应商处监控大约1万种产品。我还没有做任何文本清理(仍然是原型设计!)所以有些产品会有相同的产品,最后可能还有额外的文字。

如果我将上述标题更改为以下内容,我无法找到“Pedigree Smart”:

"query": {
        "match": { "title": "Pedigree Smart" }
}

但这可能是因为我需要使用bool并将每个关键字扩展为必须>匹配我猜?

1 个答案:

答案 0 :(得分:1)

在您的地图文件中,将标题字段标记为未分析。基本上,elasticsearch将分析该字段并将其标记为谱系智能之间的空格,因此您将无法将其整体搜索到。这是问题的一部分。

改变这个:

  "title": {
    "type": "string"
  }
}

为:

  "title": {
    "type": "string"
    "index": "not_analyzed"
  }
}