Question

我有一个2.7M文档的索引。有我的疑问：

GET ad_index/ad_type/_search
{
  "size": 20,
  "sort": {
    "until": "desc"
  },
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "state": 2
          }
        },
        {
          "range": {
            "until": {
              "gte": "now"
            }
          }
        }
      ]
    } 
  },
  "aggs" : {
    "categories": {
      "terms": {
        "field": "category_id",
        "size": 2000
      }
    }
  }
}

此查询没有匹配子查询。我有1个节点，1个分片和0个副本。

查询时间 - 60毫秒。没有聚合 - 40毫秒。点击数~50000。

可以，还是可以更快？我想要10毫秒。我得到了＆lt; 10ms的MySQL。

我使用ES 2.4。索引大小1.34 GB。我对得分不感兴趣。

UPD。

我的映射：

{
  "ad_index": {
    "mappings": {
      "ad_type": {
        "properties": {
          "customer_id": {
            "type": "long"
          },
          "deleted": {
            "type": "long"
          },
          "dynamic_fields": {
            "properties": {
              "-icq-3": {
                "type": "string"
              },
              "phone-3": {
                "type": "string"
              }
              "email-3": {
                "type": "string"
              }

              //and 100 more sparse dynamic fields 

          },
          "id": {
            "type": "long"
          },
          "category_id": {
            "type": "long"
          },
          "until": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss"
          },
          "state": {
            "type": "long"
          },
          "text": {
            "type": "string"
          }
        }
      }
    }
  }
}

其他查询：

GET ad_index/ad_type/_search
{
  "size": 20,
  "sort": {
    "until": "desc"
  },
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "state": 2
          }
        },
        {
          "range": {
            "until": {
              "gte": "now"
            }
          }
        },
        {
          "terms": {
            "category_id" : [1029, 121, ... here can be more than 200 values]
          }
        }
      ]
    } 
  },
  "aggs" : {
    "categories": {
      "terms": {
        "field": "category_id",
        "size": 2000
      }
    }
  }
}

GET ad_index/ad_type/_search
{
  "size": 20,
  "sort": {
    "until": "desc"
  },
  "query": {
    "bool": {
      "must": [
        {
        "match": {
            "_all": "some text"
          }
        }
      ], 
      "filter": [
        {
          "term": {
            "state": 2
          }
        },
        {
          "range": {
            "until": {
              "gte": "now"
            }
          }
        },
        {
          "terms": {
            "category_id" : [1029, 121, ... here can be more than 200 values]
          }
        }
      ]
    } 
  },
  "aggs" : {
    "categories": {
      "terms": {
        "field": "category_id",
        "size": 2000
      }
    }
  }
}

Answer 1

它有点慢，因为你使用的是简单的查询。

尝试使用分片来试验这些数量的文件，尝试3-5个分片。
通过设置ES_HEAP_SIZE（https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html）
查看您的查询，了解https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-profile.html

Elasticsearch查询2.7M文档，60毫秒。可以，还是可以更快？

1 个答案: