Question

我在弹性搜索中索引了很多文档。现在，我创建一个查询，该查询找到了很多文档-但他们没有找到这样的文档名称：“ hello + friend” 如果我搜索hello-他们找到了文档，但是如果搜索真实的file.name“你好+朋友”，他们没有找到它。...我的查询错了吗？其他语言（例如中文）的文件也是如此。

感谢帮助

            $params = [
            'index' => 'search_dokumentation',
            'type' => 'document',
            'size' => 500,
            'body' => [
                'query' => [
                    'bool' => [
                        'should' => [
                            'wildcard' => [
                                'file.name' => '*' . strtolower($searchTerm) . '*',
                            ],
                        ],
                        'minimum_should_match' => 1,
                    ],
                ],
                'sort' => [
                    '_score' => [
                        'order' => 'asc',
                    ],
                ],
            ],

"mappings": {

  "meta": {
    "_all": {
      "enabled": false
    },
    "properties": {
      "last_modified": {
        "type": "date",
        "format": "yyy-MM-dd HH:mm:ss"
      },
      "update_date": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  },
  "document": {
    "_all": {
      "enabled": false
    },
    "_source": {
      "excludes": [
        "file.content_base64"
      ]
    },
    "properties": {
      "article": {
        "properties": {
          "number": {
            "type": "keyword"
          }
        }
      },
      "file": {
        "properties": {
          "content_base64": {
            "type": "text"
          },
          "create_date": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "created": {
            "type": "date",
            "format": "yyy-MM-dd HH:mm:ss"
          },
          "extension": {
            "type": "keyword"
          },
          "last_accessed": {
            "type": "date",
            "format": "yyy-MM-dd HH:mm:ss"
          },
          "last_modified": {
            "type": "date",
            "format": "yyy-MM-dd HH:mm:ss"
          },
          "link_file": {
            "type": "keyword"
          },
          "link_folder": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "decompound": {
                "type": "text",
                "analyzer": "my_decompound"
              },
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "simple": {
                "type": "text",
                "analyzer": "simple"
              }
            }
          },
          "path_file": {
            "type": "keyword"
          },
          "path_folder": {
            "type": "keyword"
          },
          "path_folder_short": {
            "type": "keyword"
          },
          "permissions": {
            "type": "long"
          },
          "size": {
            "type": "long"
          },
          "version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "relation": {
        "properties": {
          "machine": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "plant": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

Answer 1

从您的映射中，您似乎正在将file.name用于standard analyzer。因此，match查询应该适合您。这是一个示例：

PUT newindex/_doc/1
{
  "file.name": "hello + friend"
}

GET newindex/_doc/_search
{
  "query": {
    "match": {
      "file.name": "hello + friend"
    }
  }
}

standard analyzer实际上将删除特殊字符。因此，如果您_analyze术语“你好+朋友”，您会看到它将其分为两个术语。

GET _analyze
{
  "text": ["hello + friend"],
  "analyzer": "standard"
}

结果：

{
  "tokens": [
    {
      "token": "hello",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "friend",
      "start_offset": 8,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

编辑：

对于将文件名“ Betriebsanleitung_Schere + Stangenmagazin_V3.5.pdf”与部分匹配项（术语“奇怪”）进行匹配的用例，可以将query_string与一些通配符一起使用。

{
  "query": {
    "query_string": {
      "query": "*stangen*"
    }
  }
}

elasticsearch查询帮助-找不到文档

1 个答案: