我正在使用whoosh软件包在Python中进行模糊搜索。我想知道是否有任何方法可以返回距离?
我的代码如下:
import codecs
import whoosh
import os, os.path
from whoosh.index import create_in
from whoosh.fields import *
from whoosh.query import FuzzyTerm
class MyFuzzyTerm(FuzzyTerm):
def __init__(self, fieldname, text, boost=1.0, maxdist=5, prefixlength=1, constantscore=True):
super(MyFuzzyTerm, self).__init__(fieldname, text, boost, maxdist, prefixlength, constantscore)
if not os.path.exists("indexdir"):
os.mkdir("indexdir")
path = u"MMM2.txt"
content = open('MMM2.txt', 'r').read()
schema = Schema(name=TEXT(stored=True), content=TEXT)
ix = create_in("indexdir", schema)
writer = ix.writer()
writer.add_document(name=path, content= content)
writer.commit()
from whoosh.qparser import QueryParser, FuzzyTermPlugin, PhrasePlugin, SequencePlugin
with ix.searcher() as searcher:
parser = QueryParser(u"content", ix.schema,termclass = MyFuzzyTerm)
parser.add_plugin(FuzzyTermPlugin())
parser.remove_plugin_class(PhrasePlugin)
parser.add_plugin(SequencePlugin())
query = parser.parse(u"\"Tennessee Riverkeep Inc\"~")
results = searcher.search(query)
print ("nb of results =", len(results))
for r in results:
print (r)
我搜索的商品是“ Tennessee Riverkeep Inc”。文件中确切显示的项目是“ Tennessee Riverkeeper Inc”。因此,在这种情况下,距离为2。我设置了最大距离5。在这种情况下,有什么方法可以返回数字“ 2”?