Question

我正在尝试使用飞快移动来进行文本搜索。

当我搜索包含-的字符串（例如'IGF-1R'）时，它最终会搜索'IGF'和'1R'，因此不会将其视为单个字符串。

知道为什么吗？

以下是我正在使用的代码：

class MyFuzzyTerm(FuzzyTerm):
     def __init__(self, fieldname, text, boost=1.0, maxdist=1, prefixlength=2, constantscore=True):
          super(MyFuzzyTerm, self).__init__(fieldname, text, boost, maxdist, prefixlength, constantscore)

with ix.searcher() as searcher:
    qp = QueryParser("gene", schema=ix.schema, termclass=MyFuzzyTerm)
    q = qp.parse('IGF-1R')

q返回：

And([MyFuzzyTerm('gene', 'igf', boost=1.000000, maxdist=1, prefixlength=2), MyFuzzyTerm('gene', '1r', boost=1.000000, maxdist=1, prefixlength=2)])

我希望它是：

MyFuzzyTerm('gene', 'igf-1r', boost=1.000000, maxdist=1, prefixlength=2)

Answer 1

将文本分成单词是tokenizer的工作，我通常使用whoosh.analysis.SpaceSeparatedTokenizer()但是对于你的情况，tokenizer是基于空格和破折号分离的。
所以我打赌你在whoosh.analysis.CharsetTokenizer(charmap)或charmap内使用whoosh.analysis.RegexTokenizer(expression=<_sre.SRE_Pattern object>, gaps=False)和（空格，短划线）。

通过将' - '转换为AND来搜索查询

1 个答案: