为什么Elasticsearch没有返回正确的结果?

时间:2018-06-12 12:35:54

标签: elasticsearch

我正在使用 Elasticsearch 6.2 配置一个 2个节点的群集。

GET _cluster/health

 {
    "cluster_name": "cluster_name",
    "status": "green",
    "timed_out": false,
    "number_of_nodes": 2,
    "number_of_data_nodes": 2,
    "active_primary_shards": 47,
    "active_shards": 94,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 0,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 100
    }

GET myindex/_settings

{
  "myindex": {
    "settings": {
      "index": {
        "number_of_shards": "3",
        "analysis": {
          "analyzer": {
            "url_split_analyzer": {
              "filter": "lowercase",
              "tokenizer": "url_split"
            }
          },
          "tokenizer": {
            "url_split": {
              "pattern": "[^a-zA-Z0-9]",
              "type": "pattern"
            }
          }
        },
        "number_of_replicas": "1",
        "version": {
          "created": "6020499"
        }
      }
    }
  }
}

这里是_mappings结构的快照:

    "myindex": {
        "mappings": {
          "mytype": {
            "properties": {
              "@timestamp": {
                "type": "date"
              },
              ............
              "active": {
                 "type": "short"
               },
             "id_domain": {
                "type": "short",
                "ignore_malformed": true
              },
            "url": {
                 "type": "text",
                 "similarity": "boolean",
                  "analyzer": "url_split_analyzer"
            }
           }
          .......

我在索引中偶然发现了一些文件,如果我使用id_domain属性查询索引,我找不到。

例如:

GET /myindex/mytype/_search
{ 
    "query": {
      "bool": {
        "must": [
          {
            "match": { "active": 1 }
          }
        ]
      }
    }
}

输出示例:

{
    "_index": "myindex",
    "_type": "mytype",
    "_id": "myurl",
    "_score": 1,
    "_source": {
        "id_domain": "73993",
        "active": 1,
        "url": "myurl",
        "@timestamp": "2018-05-21T10:55:16.247Z"
    }
}
....

返回我找到id_domain的文档列表,我找不到该ID域的查询,如下所示:

GET /myindex/mytype/_search
{
  "query": {
      "match": {
        "id_domain": 73993 // with or without " got the same result
      }
  }
}

输出

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

我无法理解为什么会这样。 我也尝试重新索引索引,但我得到了相同的结果。

我确信我错过了什么。 这种行为有什么理由吗?

谢谢

1 个答案:

答案 0 :(得分:0)

在您的映射中,id_domain的类型为short,但在您的文档中,您的值超出了短值的范围([-32,768到32,767]),即73993. / p>

您需要将类型更改为integer,一切都会正常