每个whoosh文档here,为StemmingAnalyzer提供一个无限制的缓存,使批处理索引更快:
writer = myindex.writer()
# Get the analyzer object from a text field
stem_ana = writer.schema["content"].format.analyzer
# Set the cachesize to -1 to indicate unbounded caching
stem_ana.cachesize = -1
# Reset the analyzer to pick up the changed attribute
stem_ana.clear()
# Use the writer to index documents...
唯一的问题是文档在执行此操作后没有编入索引: 这是我的架构:
schema = Schema(
title=TEXT(stored=True, analyzer=StemmingAnalyzer(), field_boost=2.0),
content=TEXT(stored=True, analyzer=StemmingAnalyzer()),
owner=NUMERIC(stored=True),
id=ID(stored=True, unique=True),
date=DATETIME(stored=True, sortable=True),
author=TEXT(stored=True),
system=TEXT(stored=True),
url=TEXT(stored=True),
type=TEXT(stored=True),
service=TEXT(stored=True),
last_updated=fields.DATETIME)
我如何索引(来自xml):
docs = xmlObj.findall('document')
for d in docs:
...
writer.update_document(...)
writer.commit()
在我更改了词干分析器缓存后,当我这样做时没有任何显示:
for doc in ix.reader().iter_docs():
#doc should be a tuple of (docnum, document)
print "docnum: {}".format(doc[0])
答案 0 :(得分:0)
看起来您只是在索引时更新文档。
因此,如果文档没有任何结果,那么没有任何内容被编入索引!
尝试writer.add_document(...)