Question

在处理完全匹配时，我会得到一个真实世界的查询：

“不在教育，就业或培训中”

转换为删除了停用词的Lucene查询会产生：

+Content:"? ? education employment ? training"

这是一个更人为的例子：

“没有这样的事情”

转换为删除了停用词的Lucene查询会产生：

+Content:"? ? ? ? thing"

我的目标是让这些搜索仅匹配用户输入的完全匹配。

一个解决方案是否可以清除停用词列表？这会产生不利影响吗？如果是这样的话？（我的google-fu失败了）

Answer 1

这一切都取决于您使用的分析仪。 StandardAnalyzer使用停用词并将其删除，事实上StopAnalyzer是StandardAnalyzer获取停用词的地方。

使用WhitespaceAnalyzer或通过继承最适合您需求的产品来创建您自己的产品，并将其修改为您想要的产品。

或者，如果你喜欢StandardAnalyzer，你可以使用自定义停用词列表新建一个：

//This is what the default stop word list is in case you want to use or filter this
var defaultStopWords = StopAnalyzer.ENGLISH_STOP_WORDS_SET;

//create a new StandardAnalyzer with custom stop words
var sa = new StandardAnalyzer(
    Version.LUCENE_29, //depends on your version
    new HashSet<string> //pass in your own stop word list
    {
        "hello",
        "world"
    });

我可以清除lucene.net中的禁用词列表，以确保完全匹配更好吗？

1 个答案: