Question

我的索引中有这样的文件：

{
  "field" : "a, b, c, d, e"
}

字段值是由数组到字符串函数生成的字符串。因此，并非每个文档都具有相同的字符串，但每个文档的值都至少为"a, b"。

现在我想要一个匹配2种文档的查询：

仅具有（确切）"a, b"字段值的文档或在字段中包含至少两个搜索字符的文档。

基本上我的问题是，如果对字段进行分析，我无法满足第一个条件，如果字段未被分析，我无法满足第二个条件。是否有一个解决方案没有克隆字段为not_alanyzed？< / p>

如果我将字段克隆到未分析的字段（在代码示例field1中），我可以使用此查询。我觉得这个查询太复杂了......：

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "or": [
          {
            "term": {
              "field1": "a, b"
            }
          },
          {
            "and": [
              {
                "term": {
                  "field": "c"
                }
              },
              {
                "term": {
                  "field1": "d"
                }
              }
            ]
          }
        ]
      }
    }
  }
}

Answer 1

您可以使用多字段映射。这允许一次发送一个字段，但是以两种不同的方式进行分析。

"properties": {
  "field" {
    "type": "multi_field",
      "fields" : {
        "field" : {"type" : "string", "index" : "analyzed"},
        "raw" : {"type" : "string", "index" : "not_analyzed"}
    }
  }
}

将文档正常发送到elasticsearch（它将在两个地方编入索引，field（或field.field）和field.raw

现在您的查询将如下所示：

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "or": [
          {
            "term": {
              "field.raw": "a, b"
            }
          },
          {
            "and": [
              {
                "term": {
                  "field": "c"
                }
              },
              {
                "term": {
                  "field": "d"
                }
              }
            ]
          }
        ]
      }
    }
  }
}

这不是最优雅的解决方案。我更希望改变存储数据的方式。似乎“a，b”表示不同的东西，可能在要过滤的文档上有一个布尔字段“a_b_only”。

祝你好运，请随时寻求更多帮助！

Answer 2

Elasticsearch版本1.X不支持multi_fields，而是使用

"title" :{ 
           "type" : "string",
            "raw" : {"type" :"string" , "index" :"not_analyzed" 
         }

有关详细信息，请参阅Elasticsearch 1.7 Docs on Multi-fields。

Answer 3

出于好奇，为什么你首先从你的阵列中创建一个字符串？ ES文档中的字段可以包含多个值，您可以使用“术语”过滤器查询它们：http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter/。因此，而不是原始的字段数据：

{
  "field1" : "a, b, c, d, e"
}

你只需将它保存在一个数组中，如下所示：

{
  "field1" : ["a", "b", "c", "d", "e"]
}

然后你会查询这样的东西（小心，这是未经测试的！）：

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "or": [
          {
            "terms": {
              "field1": ["a", "b"],
              "execution": "and"
            }
          },
          {
            "terms": {
              "field1": ["c", "d"],
              "execution": "and"
            }
          }
        ]
      }
    }
  }
}

作为最后一点，我认为您的真实数据要求'field1'设置为'not_analyzed'。

Elasticsearch完全匹配或查询

3 个答案: