我们在CentOs上有两个数据中心(欧洲的DC1,北美的DC2)DatastaxEnterprise Solr集群(版本4.5):
DC1: 2 nodes with rf set to 2
DC2: 1 nodes with rf set to 1
每个节点都有2个内核和4GB的RAM。 我们只创建了一个密钥空间,DC1的2个节点每个数据有400MB,而DC2中的节点是空的。
如果我在DC2中的节点上启动nodetool修复,该命令运行良好约20/30分钟,然后它停止工作保持卡住。
在DC2节点的日志中,我可以读到:
WARN [NonPeriodicTasks:1] 2014-10-01 05:57:44,188 WorkPool.java (line 398) Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
millis, consider increasing it, or reducing load on the node.
ERROR [NonPeriodicTasks:1] 2014-10-01 05:57:44,190 CassandraDaemon.java (line 199) Exception in thread Thread[NonPeriodicTasks:1,5,main]
org.apache.solr.common.SolrException: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
millis, consider increasing it, or reducing load on the node.
at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:351)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.doCommit(AbstractSolrSecondaryIndex.java:994)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.forceBlockingFlush(AbstractSolrSecondaryIndex.java:139)
at org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(SecondaryIndexManager.java:338)
at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:144)
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:113)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
millis, consider increasing it, or reducing load on the node.
at com.datastax.bdp.concurrent.WorkPool.doFlush(WorkPool.java:399)
at com.datastax.bdp.concurrent.WorkPool.flush(WorkPool.java:339)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.flushIndexUpdates(AbstractSolrSecondaryIndex.java:484)
at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:278)
... 12 more
WARN [commitScheduler-3-thread-1] 2014-10-01 05:58:47,351 WorkPool.java (line 398) Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
millis, consider increasing it, or reducing load on the node.
ERROR [commitScheduler-3-thread-1] 2014-10-01 05:58:47,352 SolrException.java (line 136) auto commit error...:org.apache.solr.common.SolrException: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
millis, consider increasing it, or reducing load on the node.
at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:351)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
millis, consider increasing it, or reducing load on the node.
at com.datastax.bdp.concurrent.WorkPool.doFlush(WorkPool.java:399)
at com.datastax.bdp.concurrent.WorkPool.flush(WorkPool.java:339)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.flushIndexUpdates(AbstractSolrSecondaryIndex.java:484)
at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:278)
... 8 more
我尝试在cassandra.yaml文件中增加一些超时,但没有运气。 感谢
答案 0 :(得分:1)
您的节点在DSE solr安装中的指定范围相当不足。
我通常会建议至少8个内核和至少64 Gb的内存。 堆最多分配12-14 Gb。
以下故障排除指南非常好:
您当前的数据负载很小,所以您可能不需要完全记忆 - 我猜这里的瓶颈是cpus。
如果您没有运行4.0.4或4.5.2,我可以使用其中一个版本。
答案 1 :(得分:1)
两个可能有帮助的项目:
您在日志中看到的RuntimeException
是沿着将索引更改提交到磁盘的Lucene代码路径,因此我当然会确定写入磁盘是否是您的瓶颈。 (您是否为数据使用不同的物理磁盘并提交日志?)
您可能想要同时调整的参数是控制WorkPool
中dse.yaml
刷新超时的参数flush_max_time_per_core
。
答案 2 :(得分:0)
减少solr索引争用的一种方法是在solrconfig.xml中增加autoSoftCommit maxTime
<autoSoftCommit>
<maxTime>1000000</maxTime>
</autoSoftCommit>