提议的解决方案

Question

Lucene的新手。我在Java客户端中使用Hibernate，并且在特定查询中遇到此错误：

HSEARCH000146: The query string 'a' applied on field 'name' has no meaningfull tokens to  
be matched. Validate the query input against the Analyzer applied on this field.

搜索适用于所有其他查询，即使是空结果集也是如此。我的测试数据库确实记录了'a'。这可能有什么问题？

Answer 1

'a'是一个停用词，将由StandardAnalyzer过滤掉您的查询。停用词是您搜索的语言中常见的单词，对于生成搜索结果不具有意义。这是一个简短的清单，但'a'是英语中的一个。

由于分析器已经删除了该术语，并且它是唯一存在的术语，因此您现在发送一个空查询，这是不可接受的，并且搜索失败。

对于好奇的人来说，这些是标准的Lucene英语口号：

"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"

如果您不想删除停用词，则应设置Analyzer，而不是StopFilter，或设置空停字。对于StandardAnalyzer，您可以将自定义停止集传递给the constructor：

Analyzer analyzer = new StandardAnalyzer(CharArraySet.EMPTY_SET);

Answer 2

你可以把

@Analyzer（IMPL = KeywordAnalyzer.class）

到您的领域以避免此问题。

Answer 3

提议的解决方案

@femtoRgon已经解释了出现此错误的原因，当您尝试将用户输入标记为字符串列表然后将每个字符串提供给Hibernate Search Query时，也会出现此问题。当你现在有一个停用词的字符串时，Hibernate不知道如何处理这个字符串。

但是，在将输入发送到Hibernate Search查询之前，您可以使用相同的分析器解析和验证输入。使用此方法，您可以从输入中提取相同的单词并避免错误，而无需更改为其他Analyzer类。

从您的实体类MyModelClass.class

中检索当前分析器

FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search
    .getFullTextEntityManager(entityManager);

QueryBuilder builder = fullTextEntityManager.getSearchFactory()
    .buildQueryBuilder().forEntity(MyModelClass.class).get();

Analyzer customAnalyzer = fullTextEntityManager.getSearchFactory()
    .getAnalyzer(MyModelClass.class);

输入标记符

/**
 * Validate input against the tokenizer and return a list of terms.
 * @param analyzer
 * @param string
 * @return
 */
public static List<String> tokenizeString(Analyzer analyzer, String string)
{
    List<String> result = new ArrayList<String>();
    try
    {
        TokenStream stream = analyzer.tokenStream(null, new StringReader(string));
        stream.reset();
        while (stream.incrementToken())
        {
            result.add(stream.getAttribute(CharTermAttribute.class).toString());
        }
        stream.close();
    } catch (IOException e)
    {
        throw new RuntimeException(e);
    }
    return result;
}

验证输入

现在您可以通过同一个Analyzer简单地运行您的输入字符串，并接收一个字符串列表，并正确标记，如下所示：

List<String> keywordsList = tokenizeString(customAnalyzer, "This is a sentence full of the evil stopwords);

并会收到此列表

[this, sentence, full, evil, stopwords]

我的回答基于this和this SO帖子。

获取特定查询的错误

3 个答案:

提议的解决方案

从您的实体类MyModelClass.class

输入标记符

验证输入