重新启动群集中的节点时,我有时会收到此错误消息
INFO [IndexSummaryManager:1] 2016-04-12 19:32:53,574 IndexSummaryRedistribution.java:74 - Redistributing index summaries
ERROR [HintsWriteExecutor:1] 2016-04-12 20:02:15,636 CassandraDaemon.java:195 - Exception in thread Thread[HintsWriteExecutor:1,5,main]
org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: bin/../data/hints/389cb0d3-87b9-4221-8352-065e8ce50fdb-1460462523225-1.crc32: File too large
at org.apache.cassandra.hints.HintsWriter.writeChecksum(HintsWriter.java:116) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriter.close(HintsWriter.java:124) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsStore.closeWriter(HintsStore.java:201) ~[apache-cassandra-3.2.1.jar:3.2.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_72]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: java.nio.file.FileSystemException: bin/../data/hints/389cb0d3-87b9-4221-8352-065e8ce50fdb-1460462523225-1.crc32: File too large
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_72]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_72]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_72]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_72]
at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) ~[na:1.8.0_72]
at java.nio.file.Files.newOutputStream(Files.java:216) ~[na:1.8.0_72]
at org.apache.cassandra.hints.HintsWriter.writeChecksum(HintsWriter.java:110) ~[apache-cassandra-3.2.1.jar:3.2.1]
... 7 common frames omitted
然后节点开始关闭,但是反复发生异常抛出异常并每隔几秒重试一次。这是我的主要问题,因为它阻止群集接受连接
ERROR [HintsWriteExecutor:1] 2016-04-12 20:02:15,638 StorageService.java:440 - Stopping gossiper
WARN [HintsWriteExecutor:1] 2016-04-12 20:02:15,651 StorageService.java:347 - Stopping gossip by operator request
INFO [HintsWriteExecutor:1] 2016-04-12 20:02:15,651 Gossiper.java:1455 - Announcing shutdown
DEBUG [HintsWriteExecutor:1] 2016-04-12 20:02:15,653 StorageService.java:1921 - Node /169.34.103.150 state shutdown, token [-1028691827956217809, -1257393635657191129, -1285475466194230655, -1398822673992383910, -1549844858878358481, -1638651369075180065, -1660825917518666149, -1802478872312866489, -1834618755337322564, -188187624477415935, -2034018930607672685, -210049110249018365, -2157250079133002505, -2171215058533514263, -2183510006393476006, -2193567329545672696, -2317710820662725097, -2319735333341559730, -2333531623390263516, -2458839661565177963, -2489690089103800827, -2710032230533922787, -2780200665893123668, -283628639049224915, -2886550293705646069, -293132189636842303, -2945647150702034785, -2965944925251907629, -2990231874502594267, -2991676811317743630, -2997538046339800243, -3176643432551515484, -3176889844544735478, -3201806929871841501, -3211631881211211792, -322057073400957538, -3242716520974847469, -3424682940569182570, -3441313897213257083, -3448874645237774640, -3452014929671888774, -3487048220426765500, -3523033168154067409, -3738270231064896111, -3792947624538231469, -3850123184653095411, -3859434367535677710, -3993763147657603241, -4010731345091378481, -4258687888114256086, -429860111391304244, -4318544125783476774, -4471769468265919226, -4588065176944445932, -4669765414071774677, -4670558952147294236, -4710259358376415554, -4907784900060021493, -4934593823248165235, -4934821923831720820, -5013288056003569837, -5110268421077583856, -5133973510660140774, -5159515181162633178, -5276029184678521021, -5286266972273013716, -540287937883749850, -5456649087226389873, -5495658378651725051, -5501165049612471047, -5535468008960837763, -5716046948204274477, -5721008906555397374, -5753456205099778029, -5770577886564351775, -5790919034460455792, -5929058167490034525, -5943865033771694477, -608562636816376813, -6109108822129963089, -6140834685397419488, -6170120179807852740, -6179956847809119210, -6245955388738336647, -6286189790411746933, -6299162407942815080, -6315904471665416400, -6364734987085439789, -6419018190136685454, -6451287738650323275, -6547213964231430849, -6548484474977763138, -6549052151069571925, -6698516302891374040, -6872407277556537836, -6901128430416607497, -6935384932230430038, -6937998036050345125, -7031528786091227188, -7106019277455303867, -7119774336125808637, -7191744312745689956, -7225558820114789693, -7349977359560186580, -7422626834116218143, -7431995410347149964, -7466585374358878727, -750799820874518113, -7610594360825096930, -7616154542884798259, -7629884030042898550, -7728553164832596613, -7789353727430662940, -779402220858888622, -7843332228444504745, -7854439386306622129, -815495344326874929, -8338520822777210140, -8649102261484559375, -8796027903791901112, -8898390484583881495, -8923261220379460832, -8943079358447105951, -9050583546904370510, -9080494386502531561, -9139630196350606101, -939306213156730751, -942614916980620152, 1049830730407134075, 1125127596836820990, 1133356003300268705, 1133623932124213230, 1247043876318218235, 1490023295198042772, 1497436537080113324, 1516791905674857253, 1603966065250122923, 1646125781869948326, 1740544126107535998, 1756218012030701589, 1804735370513211257, 1812139850525677114, 1819880350303805394, 1841691686666460445, 1888363141244474676, 2010883847009222978, 2016297526252235227, 2021110586668181290, 2084880932156441613, 2093427980091185166, 2112052724153374980, 2186638483475842552, 2195406825247987731, 2283720951686386464, 23875829161989945, 2521818329391092608, 2522645057607851918, 2524720168145693638, 2541003400153964040, 2650684785592761012, 2723290502273715430, 2808119513098478236, 2821997019638778146, 2891379770529557184, 2907285214187020532, 2963307217336709534, 3061757915053031951, 3122571062025066142, 3128771694670016319, 3130206542424936603, 3197285318974197102, 3218987271686146429, 3329594065878248111, 3331926835266199716, 3526280986313508860, 3542343528340649978, 3589794725284000659, 3610364312437568329, 3701861372719378732, 373747767999916658, 3826422069022675393, 3856151860383170644, 3862031127704782057, 4049338078570571707, 4137865494092400430, 4241199357440741315, 4520402233521387342, 456519309520244643, 4715328215899051522, 4817677510120292180, 497627869146346949, 4995322204306807081, 5030633110404844305, 5038572404428039197, 5042627643214511398, 5281377762367584052, 5494577271219306513, 5530410713928998603, 5537215727145277166, 567120218785751902, 575743986375007756, 5784212620383428248, 5837914425280614947, 5977153680566647690, 5996674261833528410, 6083452088392300601, 6112178449036583235, 6264713703969393897, 6287772759341778176, 6314363909221383341, 6321343658409071604, 6475821468968027456, 6543311556613206558, 6912492987521221000, 6922280123185191829, 694545242943535806, 7183280296372529849, 7306070312091628992, 7412331756775823975, 7518294356359523088, 7567542433462808235, 7589810674525331548, 7637277610587157806, 773319528418720822, 7760484456189230502, 7816590204960057932, 7820991841796591957, 7836345109808448402, 7859570796753174, 8003409347394992259, 8012927612089894493, 8031750463661605171, 8051744553293723603, 8066222841813137181, 8073294271415597086, 8117861819974218900, 817982542709209563, 8198846486095494968, 8214665766962397555, 8277428606113435880, 8279100634451559360, 8316406004646641445, 8367052745804770548, 8373819718798220972, 8439087240414018142, 8444359473760446267, 8449256096936507263, 8717779586961798956, 8912188109780463904, 8920579922439529433, 8951968899880736480, 9043168611036813220, 9044578575232639242, 9045812874827336349, 9140634849238500115, 915715308827014103]
INFO [HintsWriteExecutor:1] 2016-04-12 20:02:15,653 StorageService.java:1924 - Node /169.34.103.150 state jump to shutdown
DEBUG [PendingRangeCalculator:1] 2016-04-12 20:02:15,655 PendingRangeCalculatorService.java:64 - finished calculation for 74 keyspaces in 0ms
ERROR [HintsWriteExecutor:1] 2016-04-12 20:02:17,655 StorageService.java:450 - Stopping native transport
INFO [HintsWriteExecutor:1] 2016-04-12 20:02:17,705 Server.java:182 - Stop listening for CQL clients
ERROR [HintsWriteExecutor:1] 2016-04-12 20:02:23,219 CassandraDaemon.java:195 - Exception in thread Thread[HintsWriteExecutor:1,5,main]
org.apache.cassandra.io.FSWriteError: java.nio.channels.ClosedChannelException
at org.apache.cassandra.hints.HintsWriter.newSession(HintsWriter.java:146) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor.flushInternal(HintsWriteExecutor.java:221) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor.flush(HintsWriteExecutor.java:203) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor.lambda$flush$217(HintsWriteExecutor.java:196) ~[apache-cassandra-3.2.1.jar:3.2.1]
at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) ~[na:1.8.0_72]
at org.apache.cassandra.hints.HintsWriteExecutor.flush(HintsWriteExecutor.java:196) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor.access$000(HintsWriteExecutor.java:36) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor$FlushBufferPoolTask.run(HintsWriteExecutor.java:155) ~[apache-cassandra-3.2.1.jar:3.2.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_72]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: java.nio.channels.ClosedChannelException: null
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) ~[na:1.8.0_72]
at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300) ~[na:1.8.0_72]
at org.apache.cassandra.hints.HintsWriter.newSession(HintsWriter.java:142) ~[apache-cassandra-3.2.1.jar:3.2.1]
... 12 common frames omitted
我google了一下,认为最初的异常是由节点试图重播非常大的提示并且无法加载引起的。
我试图找到一些可以阻止这种情况的参数,但我发现只能关闭提示切换(hinted_handoff_enabled)或减少提示切换运行的时间(max_hint_window_in_ms)。我不认为我可以使用不一致的集群,并希望有一个选项将提示拆分为多个文件,但找不到这样的选项。
以前有人见过这个问题吗?有没有办法将提示拆分成多个文件?我怎么处理这个?
编辑:我通过配置搜索并发现了这个
max_hints_file_size_in_mb:128
考虑到我正在运行的机器,这对我来说似乎非常有用。如果我的提示限制在128MB,那么我真的不明白为什么我有上述例外。
在此节点上运行时,nodetool引发了异常。其他节点还可以,但我第二天早上只运行了nodetool(异常后12小时)。
异常抱怨的文件不再存在,但文件名应该没问题,因为我有许多其他具有相似名称(相同长度)的文件。有趣的是异常是抱怨.crc32文件(不是.hint文件)