在新节点的引导期间重新启动失败/停止的流

时间:2014-04-03 12:31:57

标签: datastax-enterprise

我们正在尝试向我们的集群添加一个新的Solr节点:

DC Cassandra

  • Cassandra node 1

DC Solr

  • Solr节点1< - 新节点(实际上是旧节点的替代品)
  • Solr node 2
  • Solr node 3
  • Solr node 4
  • Solr节点5

在引导过程中:

  1. 从节点3到节点1的流失败,但出现异常:

      

    ERROR [STREAM-OUT- / IP_OF_NODE1] 2014-04-01 01:14:40,887 CassandraDaemon.java(第196行)线程中的异常线程[STREAM-OUT- / IP_OF_NODE1,5,main]   显示java.lang.NullPointerException   在org.apache.cassandra.streaming.ConnectionHandler $ MessageHandler.signalCloseDone(ConnectionHandler.java:249)   在org.apache.cassandra.streaming.ConnectionHandler $ OutgoingMessageHandler.run(ConnectionHandler.java:375)   在java.lang.Thread.run(Thread.java:744)

  2. 从节点4到节点1的流从未启动过。节点4的system.log中的最后一个相关行是:

      

    收到Bootstrap的流式传输计划。

    应该遵循:

      

    准备完成。接收0个文件(0个字节),发送x个文件(y个字节)

  3. 似乎现在停止了引导过程,因为数据文件大小不再变化。如何强制重试这些流?

    修改

    我今天重新启动了所有节点,试图强制新节点重试引导过程。不幸的是,它再次遇到了一些流故障。这次,节点1 中的异常如下:

    WARN [STREAM-IN-/IP_OF_NODE3] 2014-04-06 20:48:17,963 StreamSession.java (line 532) [Stream #c84effb0-bda9-11e3-a07d-89325af2f6bf] Retrying for following error
    java.lang.RuntimeException: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-tmp-jb-1209-Data.db (Too many open files)
        at org.apache.cassandra.io.util.SequentialWriter.<init>(SequentialWriter.java:75)
        at org.apache.cassandra.io.compress.CompressedSequentialWriter.<init>(CompressedSequentialWriter.java:71)
        at org.apache.cassandra.io.compress.CompressedSequentialWriter.open(CompressedSequentialWriter.java:42)
        at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:107)
        at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:60)
        at org.apache.cassandra.streaming.StreamReader.createWriter(StreamReader.java:111)
        at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:65)
        at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:47)
        at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:37)
        at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
        at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:283)
        at java.lang.Thread.run(Thread.java:724)
    Caused by: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-tmp-jb-1209-Data.db (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.cassandra.io.util.SequentialWriter.<init>(SequentialWriter.java:71)
    ERROR [STREAM-IN-/78.46.63.218] 2014-04-06 20:48:17,964 StreamSession.java (line 418) [Stream #c84effb0-bda9-11e3-a07d-89325af2f6bf] Streaming error occurred
    java.lang.IllegalArgumentException: Unknown type 0
        at org.apache.cassandra.streaming.messages.StreamMessage$Type.get(StreamMessage.java:89)
        at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
        at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:283)
        at java.lang.Thread.run(Thread.java:724)
    

    日志中有大量类似的错误。 e.g:

    ERROR [CompactionExecutor:129] 2014-04-06 20:50:06,401 CassandraDaemon.java (line 196) Exception in thread Thread[CompactionExecutor:129,1,main]
    java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-51-Data.db (Too many open files)
        at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:154)
        at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:137)
        at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:400)
        at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
        at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:833)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
    Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-51-Data.db (Too many open files)
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:47)
        at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
        at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
        at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1195)
        at org.apache.cassandra.db.columniterator.SimpleSliceReader.<init>(SimpleSliceReader.java:57)
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42)
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
        at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1550)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
        at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
        at org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(SliceQueryPager.java:77)
        at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:84)
        at org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(SliceQueryPager.java:33)
        at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:148)
        ... 10 more
    Caused by: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-51-Data.db (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:58)
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:76)
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
        ... 28 more
    

2 个答案:

答案 0 :(得分:0)

这似乎与Cassandra bug /问题非常相似:

https://issues.apache.org/jira/browse/CASSANDRA-6965

我会跟进。

同时,您可以在该新节点上运行重建/修复。

编辑:另一个看似相关的Cassandra问题:

CASSANDRA-6984 - “修复期间流媒体中的NullPointerException”

https://issues.apache.org/jira/browse/CASSANDRA-6984 该问题被标记为阻止程序,因此应立即引起注意。我已经询问是否有解决方法。

请继续关注。

答案 1 :(得分:0)

  

(打开的文件过多)

看起来你需要增加你的ulimit。