我们有一个非常基本的Lucene设置。我们最近注意到一些文档没有写入索引。
这就是我们创建文档的方式:
private void addToDirectory(SpecialDomainObject specialDomainObject) throws IOException {
Document document = new Document();
document.add(new TextField("id", String.valueOf(specialDomainObject.getId()), Field.Store.YES));
document.add(new TextField("name", specialDomainObject.getName(), Field.Store.YES));
document.add(new TextField("tags", joinTags(specialDomainObject.getTags()), Field.Store.YES));
document.add(new TextField("contents", getContents(specialDomainObject), Field.Store.YES));
for (Language language : getAllAssociatedLanguages(specialDomainObject)) {
document.add(new IntField("languageId", language.getId(), Field.Store.YES));
}
specialDomainObjectIndexWriter.updateDocument(new Term("id", document.getField("id").stringValue()), document);
specialDomainObjectIndexWriter.commit();
}
这就是我们创建分析器和索引编写器的方法:
<bean id="luceneVersion" class="org.apache.lucene.util.Version" factory-method="valueOf">
<constructor-arg value="LUCENE_46"/>
</bean>
<bean id="analyzer" class="org.apache.lucene.analysis.standard.StandardAnalyzer">
<constructor-arg ref="luceneVersion"/>
</bean>
<bean id="specialDomainObjectIndexWriter" class="org.apache.lucene.index.IndexWriter">
<constructor-arg ref="specialDomainObjectDirectory" />
<constructor-arg>
<bean class="org.apache.lucene.index.IndexWriterConfig">
<constructor-arg ref="luceneVersion"/>
<constructor-arg ref="analyzer" />
<property name="openMode" value="CREATE_OR_APPEND"/>
</bean>
</constructor-arg>
</bean>
使用计划任务完成索引:
@Component
public class ScheduledSpecialDomainObjectIndexCreationTask implements ScheduledIndexCreationTask {
private static final Logger logger = LoggerFactory.getLogger(ScheduledSpecialDomainObjectIndexCreationTask.class);
@Autowired
private IndexOperator specialDomainObjectIndexOperator;
@Scheduled(fixedDelay = 3600 * 1000)
@Override
public void createIndex() {
Date indexCreationStartDate = new Date();
try {
logger.info("Updating complete special domain object index...");
specialDomainObjectIndexOperator.createIndex();
if (logger.isDebugEnabled()) {
Date indexCreationEndDate = new Date();
logger.debug("Index creation duration: {} ms", indexCreationEndDate.getTime() - indexCreationStartDate.getTime());
}
} catch (IOException e) {
logger.error("Could update complete special domain object index.", e);
}
}
}
createIndex()实现如下:
@Override
public void createIndex() throws IOException {
logger.trace("Preparing for index generation...");
IndexWriter indexWriter = getIndexWriter();
Date start = new Date();
logger.trace("Deleting all documents from index...");
indexWriter.deleteAll();
logger.trace("Starting index generation...");
long numberOfProcessedObjects = fillIndex();
logger.debug("Index written in " + (new Date().getTime() - start.getTime()) + " milliseconds.");
logger.debug("Number of processed objects: {}", numberOfProcessedObjects);
logger.debug("Number of documents in index: {}", indexWriter.numDocs());
indexWriter.commit();
indexWriter.forceMerge(1);
}
@Override
protected long fillIndex() throws IOException {
Page<SpecialDomainObject> specialDomainObjectsPage = specialDomainObjectRepository.findAll(new PageRequest(0, MAXIMUM_PAGE_ELEMENTS));
while (true) {
addToDirectory(specialDomainObjectsPage);
if (specialDomainObjectsPage.hasNextPage()) {
specialDomainObjectsPage =
specialDomainObjectRepository.findAll(new PageRequest(specialDomainObjectsPage.getNumber() + 1, specialDomainObjectsPage.getSize()));
} else {
break;
}
}
return specialDomainObjectsPage.getTotalElements();
}
大约有2000个specialDomainObject实例,大约有80个没有写入索引(我们用Luke检查了这个)。
是否有任何可能导致遗失文件的内容?
答案 0 :(得分:0)
我们发现了问题:操作系统的默认编码未设置为UTF-8。