Question

我们有一个Grails项目，它在负载均衡器后面运行。在服务器上运行了三个Grails应用程序实例（使用单独的Tomcat实例）。每个实例都有自己的可搜索索引。由于索引是分开的，因此自动更新不足以使应用程序实例之间的索引保持一致。因此，我们禁用了可搜索的索引镜像，并且在预定的石英作业中手动完成对索引的更新。根据我们的理解，应用程序的其他任何部分都不应修改索引。

石英作业每分钟运行一次，它从数据库中检查应用程序已更新哪些行，并重新索引这些对象。该作业还会检查相同的作业是否已在运行，因此它不会执行任何并发索引。应用程序在启动后运行几个小时，然后在作业启动时突然运行，抛出LockObtainFailedException：

22.10.2012 11:20:40 [xxxx.ReindexJob]错误无法更新可搜索的索引，类org.compass.core.engine.SearchEngineException：无法为子索引[product]打开writer;嵌套异常是 org.apache.lucene.store.LockObtainFailedException：Lock获取定时出： SimpleFSLock@/home/xxx/tomcat/searchable-index/index/product/lucene-a7bbc72a49512284f5ac54f5d7d32849-write.lock

根据上次执行作业的日志，重新编制索引没有任何错误，并且作业成功完成。但是，这次重新索引操作会抛出锁定异常，就像前一个操作未完成并且锁定尚未释放一样。在重新启动应用程序之前，锁定不会被释放。

我们尝试通过手动打开锁定的索引来解决问题，这会导致以下错误打印到日志中：

22.10.2012 11:21:30 [manager.IndexWritersManager]错误非法状态，将索引编写器标记为打开，而另一个标记为打开子索引[product]

在此之后，作业似乎正常工作，并且不会再次陷入锁定状态。但是，这会导致应用程序不断使用100％的CPU资源。以下是石英作业代码的缩短版本。

提前感谢任何帮助以解决问题。

class ReindexJob {

def compass
...

static Calendar lastIndexed

static triggers = {
    // Every day every minute (at xx:xx:30), start delay 2 min
    // cronExpression:                           "s  m h D M W [Y]"
    cron name: "ReindexTrigger", cronExpression: "30 * * * * ?", startDelay: 120000
}

def execute() {
    if (ConcurrencyHelper.isLocked(ConcurrencyHelper.Locks.LUCENE_INDEX)) {
        log.error("Search index has been locked, not doing anything.")
        return
    }

    try {
        boolean acquiredLock = ConcurrencyHelper.lock(ConcurrencyHelper.Locks.LUCENE_INDEX, "ReindexJob")
        if (!acquiredLock) {
            log.warn("Could not lock search index, not doing anything.")
            return
        }

        Calendar reindexDate = lastIndexed
        Calendar newReindexDate = Calendar.instance
        if (!reindexDate) {
            reindexDate = Calendar.instance
            reindexDate.add(Calendar.MINUTE, -3)
            lastIndexed = reindexDate
        }

        log.debug("+++ Starting ReindexJob, last indexed ${TextHelper.formatDate("yyyy-MM-dd HH:mm:ss", reindexDate.time)} +++")
        Long start = System.currentTimeMillis()

        String reindexMessage = ""

        // Retrieve the ids of products that have been modified since the job last ran
        String productQuery = "select p.id from Product ..."
        List<Long> productIds = Product.executeQuery(productQuery, ["lastIndexedDate": reindexDate.time, "lastIndexedCalendar": reindexDate])

        if (productIds) {
            reindexMessage += "Found ${productIds.size()} product(s) to reindex. "

            final int BATCH_SIZE = 10
            Long time = TimeHelper.timer {
                for (int inserted = 0; inserted < productIds.size(); inserted += BATCH_SIZE) {
                    log.debug("Indexing from ${inserted + 1} to ${Math.min(inserted + BATCH_SIZE, productIds.size())}: ${productIds.subList(inserted, Math.min(inserted + BATCH_SIZE, productIds.size()))}")
                    Product.reindex(productIds.subList(inserted, Math.min(inserted + BATCH_SIZE, productIds.size())))
                    Thread.sleep(250)
                }
            }

            reindexMessage += " (${time / 1000} s). "
        } else {
            reindexMessage += "No products to reindex. "
        }

        log.debug(reindexMessage)

        // Re-index brands
        Brand.reindex()

        lastIndexed = newReindexDate

        log.debug("+++ Finished ReindexJob (${(System.currentTimeMillis() - start) / 1000} s) +++")
    } catch (Exception e) {
        log.error("Could not update searchable index, ${e.class}: ${e.message}")
        if (e instanceof org.apache.lucene.store.LockObtainFailedException || e instanceof org.compass.core.engine.SearchEngineException) {
            log.info("This is a Lucene index locking exception.")
            for (String subIndex in compass.searchEngineIndexManager.getSubIndexes()) {
                if (compass.searchEngineIndexManager.isLocked(subIndex)) {
                    log.info("Releasing Lucene index lock for sub index ${subIndex}")
                    compass.searchEngineIndexManager.releaseLock(subIndex)
                }
            }
        }
    } finally {
        ConcurrencyHelper.unlock(ConcurrencyHelper.Locks.LUCENE_INDEX, "ReindexJob")
    }
}
}

基于JMX CPU样本，似乎Compass正在幕后进行一些调度。从1分钟的CPU样本看，当正常情况和100％CPU实例进行比较时，似乎有一些不同的东西：

org.apache.lucene.index.IndexWriter.doWait（）占用了大部分CPU时间。
Compass Scheduled Executor Thread显示在线程列表中，这在正常情况下是看不到的。
One Compass Executor Thread正在执行commitMerge，在正常情况下，这些线程都没有执行commitMerge。

Answer 1

您可以尝试增加'compass.transaction.lockTimeout'设置。默认值为10（秒）。

另一种选择是在Compass中禁用并发并使其同步。这是通过'compass.transaction.processor.read_committed.concurrentOperations'：'false'设置来控制的。您可能还需要将'compass.transaction.processor'设置为'read_committed'

这些是我们目前使用的指南针设置：

compassSettings = [
'compass.engine.optimizer.schedule.period': '300',
'compass.engine.mergeFactor':'1000',
'compass.engine.maxBufferedDocs':'1000',
'compass.engine.ramBufferSize': '128',
'compass.engine.useCompoundFile': 'false',
'compass.transaction.processor': 'read_committed',
'compass.transaction.processor.read_committed.concurrentOperations': 'false',
'compass.transaction.lockTimeout': '30',
'compass.transaction.lockPollInterval': '500',
'compass.transaction.readCommitted.translog.connection': 'ram://'
]

关闭了并发性。您可以通过将'compass.transaction.processor.read_committed.concurrentOperations'设置更改为'true'来使其异步。（或删除条目）。

指南针配置参考： http://static.compassframework.org/docs/latest/core-configuration.html

read_committed处理器并发的文档： http://www.compass-project.org/docs/latest/reference/html/core-searchengine.html#core-searchengine-transaction-read_committed

如果要保持异步操作，还可以控制它使用的线程数。使用compass.transaction.processor.read_committed.concurrencyLevel = 1设置将允许异步操作，但只使用一个线程（默认为5个线程）。还有compass.transaction.processor.read_committed.backlog和compass.transaction.processor.read_committed.addTimeout设置。

我希望这会有所帮助。

可搜索索引在手动更新时被锁定（LockObtainFailedException）

1 个答案: