Question

想象一下，我有一个索引，其中包含以下三个表示图像及其颜色的文档。

code templates

如果用户想要一个“红色图像”（在下拉菜单或“标签云”中选择），则在浮点数（可能是[ { "id": 1, "intensity": { "red": 0.6, "green": 0.1, "blue": 0.3 } }, { "id": 2, "intensity": { "red": 0.5, "green": 0.6, "blue": 0.0 } }, { "id": 3, "intensity": { "red": 0.98, "green": 0.0, "blue": 0.0 } } ]）上进行范围查询非常方便。我还可以使用该查询的得分来获得排名最高的“最红”图像。但是，如果我想提供自由文本搜索，它将变得更加困难。我的解决方案是按以下方式对文档建立索引（例如，在索引时间使用intensity.red > 0.5）

if color > 0.5 then append(colors, color_name)

我现在可以在颜色字段上使用[ { "id": 1, "colors": ["red"] }, { "id": 2, "colors": ["green", "red"] } { "id": 3, "colors": ["red"] } ]或query_string，然后搜索match，但是突然之间，我失去了排名的可能性。 ID 3比ID 1（"red"和0.98）的红色要远得多，但是得分会相似吗？

我的问题是：我也可以吃蛋糕吗？

我看到的一种解决方案是拥有一个索引，该索引将自由文本转换为“关键字”，稍后在实际搜索中使用。

0.6

但是随后我需要为每个搜索触发两次搜索，但这也许是唯一的选择吗？

Answer 1

您可以结合使用nested数据类型和function_score查询来获得所需的结果。

您需要更改存储图像数据的方式。映射如下：

PUT test
{
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "integer"
        },
        "image": {
          "type": "nested",
          "properties": {
            "color": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword"
                }
              }
            },
            "intensity": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}

索引图像数据如下：

PUT test/_doc/1
{
  "id": 1,
  "image": [
    {
      "color": "red",
      "intensity": 0.6
    },
    {
      "color": "green",
      "intensity": 0.1
    },
    {
      "color": "blue",
      "intensity": 0.3
    }
  ]
}

以上对应于您在问题中发布的第一张图像数据。同样，您可以索引其他图像数据。

现在，当用户搜索red时，查询应按以下方式构建：

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "image",
            "query": {
              "function_score": {
                "query": {
                  "bool": {
                    "must": [
                      {
                        "match": {
                          "image.color": "red"
                        }
                      },
                      {
                        "range": {
                          "image.intensity": {
                            "gt": 0.5
                          }
                        }
                      }
                    ]
                  }
                },
                "field_value_factor": {
                  "field": "image.intensity",
                  "modifier": "none",
                  "missing": 0
                }
              }
            }
          }
        }
      ]
    }
  }
}

您可以在上面的查询中看到我使用image.intensity的字段值来计算分数。

基于动态术语的排序结果

1 个答案: