Question

我正在尝试对某些文档的字段进行全文搜索，我正在寻找有关此操作的建议。我首先尝试执行这种类型的请求：

GET http://localhost:8080/search/?query=lord+of+the+rings

但是它返回了文档，其中字段是完全匹配的，除了给定的字符串外没有其他信息，所以我尝试了YQL中的等效方法：

GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text CONTAINS "lord of the rings";

我得到了完全一样的结果。但是，当进一步阅读文档时，我发现了MATCHES指令，通过执行以下请求，它确实为我提供了所需的结果：

GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text MATCHES "lord of the rings";

尽管我不知道为什么，但是对于某些此类请求，我遇到了此类超时错误：

{
    "root": {
        "id": "toplevel",
        "relevance": 1,
        "fields": {
            "totalCount": 0
        },
        "errors": [
            {
                "code": 12,
                "summary": "Timed out",
                "source": "site",
                "message": "Timeout while waiting for sc0.num0"
            }
        ]
    }
}

所以我通过添加大于默认超时值的方法来解决此问题：

GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text MATCHES "lord of the rings";&timeout=20000

我的问题是，我是否以正确的方式进行全文搜索，并且该如何改善呢？

编辑：这是相应的搜索定义：

search site {

    document site {

        field text type string {
            stemming: none
            normalizing: none
            indexing: attribute
        }

        field title type string {
            stemming: none
            normalizing: none
            indexing: attribute
        }
    }

    fieldset default {
        fields: title, text
    }

    rank-profile post inherits default {
        rank-type text: about
        rank-type title: about
        first-phase {
            expression: nativeRank(title, text)
        }
   }
}

Answer 1

您的搜索定义文件是什么样的？我怀疑您已将文本内容放在“属性”字段中，该字段默认为“单词匹配”语义。您可能需要“文本匹配”语义，这意味着您需要将内容放在“索引”类型字段中。

https://docs.vespa.ai/documentation/reference/search-definitions-reference.html#match

您正在使用的“ MATCHES”运算符将您的输入解释为正则表达式，虽然功能强大，但是操作缓慢，因为它将正则表达式应用于所有属性（可以对https://swtch.com/~rsc/regexp/regexp4.html之类的内容进行进一步的优化，但目前尚无法优化实施）。

如何在Vespa中执行全文搜索？

1 个答案: