Question

每个whoosh文档here，为StemmingAnalyzer提供一个无限制的缓存，使批处理索引更快：

writer = myindex.writer()
# Get the analyzer object from a text field
stem_ana = writer.schema["content"].format.analyzer
# Set the cachesize to -1 to indicate unbounded caching
stem_ana.cachesize = -1
# Reset the analyzer to pick up the changed attribute
stem_ana.clear()

# Use the writer to index documents...

唯一的问题是文档在执行此操作后没有编入索引：这是我的架构：

schema = Schema(
                title=TEXT(stored=True, analyzer=StemmingAnalyzer(), field_boost=2.0),
                content=TEXT(stored=True, analyzer=StemmingAnalyzer()),

                owner=NUMERIC(stored=True),
                id=ID(stored=True, unique=True),
                date=DATETIME(stored=True, sortable=True),
                author=TEXT(stored=True),
                system=TEXT(stored=True),
                url=TEXT(stored=True),
                type=TEXT(stored=True),
                service=TEXT(stored=True),
                last_updated=fields.DATETIME)

我如何索引（来自xml）：

docs = xmlObj.findall('document')
for d in docs:
    ...

    writer.update_document(...)

writer.commit()

在我更改了词干分析器缓存后，当我这样做时没有任何显示：

for doc in ix.reader().iter_docs():
    #doc should be a tuple of (docnum, document)
    print "docnum: {}".format(doc[0])

Answer 1

看起来您只是在索引时更新文档。

因此，如果文档没有任何结果，那么没有任何内容被编入索引！

尝试writer.add_document（...）

SoshmingAnalyzer缓存更改后，文档未被编入索引

1 个答案: