SoshmingAnalyzer缓存更改后,文档未被编入索引

时间:2016-03-04 06:53:09

标签: python caching indexing whoosh

每个whoosh文档here,为StemmingAnalyzer提供一个无限制的缓存,使批处理索引更快:

writer = myindex.writer()
# Get the analyzer object from a text field
stem_ana = writer.schema["content"].format.analyzer
# Set the cachesize to -1 to indicate unbounded caching
stem_ana.cachesize = -1
# Reset the analyzer to pick up the changed attribute
stem_ana.clear()

# Use the writer to index documents...

唯一的问题是文档在执行此操作后没有编入索引: 这是我的架构:

schema = Schema(
                title=TEXT(stored=True, analyzer=StemmingAnalyzer(), field_boost=2.0),
                content=TEXT(stored=True, analyzer=StemmingAnalyzer()),

                owner=NUMERIC(stored=True),
                id=ID(stored=True, unique=True),
                date=DATETIME(stored=True, sortable=True),
                author=TEXT(stored=True),
                system=TEXT(stored=True),
                url=TEXT(stored=True),
                type=TEXT(stored=True),
                service=TEXT(stored=True),
                last_updated=fields.DATETIME)

我如何索引(来自xml):

docs = xmlObj.findall('document')
for d in docs:
    ...

    writer.update_document(...)

writer.commit()

在我更改了词干分析器缓存后,当我这样做时没有任何显示:

for doc in ix.reader().iter_docs():
    #doc should be a tuple of (docnum, document)
    print "docnum: {}".format(doc[0])

1 个答案:

答案 0 :(得分:0)

看起来您只是在索引时更新文档。

因此,如果文档没有任何结果,那么没有任何内容被编入索引!

尝试writer.add_document(...)