Aerospike突然崩溃

时间:2016-01-07 13:19:23

标签: crash epoll key-value-store aerospike

我正在运行版本为3.7.0.2的5节点集群,经过几个小时的使用后,所有5个实例都崩溃了。我在这个版本中看到了其他一些崩溃报告。我应该下载3.7.1版本吗?它会修复崩溃吗?

  

Linux aerospike2 4.2.0-18-generic#22-Ubuntu SMP Fri Nov 6 18:25:50   UTC 2015 x86_64 x86_64 x86_64 GNU / Linux(Ubuntu 15.10)

配置:

# Aerospike database configuration file.

service {
    user root
    group root
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
    pidfile /var/run/aerospike/asd.pid
    service-threads 32
    transaction-queues 32
    transaction-threads-per-queue 32
        batch-index-threads 32
    proto-fd-max 15000
        batch-max-requests 200000
}

logging {
    # Log file must be an absolute path.
    file /var/log/aerospike/aerospike.log {
        context any info
    }
}

network {
    service {
        address 10.240.0.6
        port 3000
    }

    heartbeat {
                mode mesh
                address 10.240.0.6  # IP of the NIC on which this node is listening
                mesh-seed-address-port 10.240.0.6 3002
                mesh-seed-address-port 10.240.0.5 3002

                port 3002

        interval 150
        timeout 10
    }

    fabric {
        port 3001
    }

    info {
        port 3003
    }
}

namespace test {
    replication-factor 10
    memory-size 3500M
    default-ttl 0 # 30 days, use 0 to never expire/evict.
        ldt-enabled true

    storage-engine device {
          file /data/aerospike.dat
          write-block-size 1M
          filesize 300G
          # data-in-memory true
        }
}

日志:

Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::3202) device /data/aerospike.dat: read complete: UNIQUE 13593274 (REPLACED 0) (GEN 63) (EXPIRED 0) (MAX-TTL 0) records
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1072) ns test loading free & defrag queues
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /data/aerospike.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /data/aerospike.dat init wblock free-q 220796, defrag-q 2
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::2373) ns test starting device maintenance threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1488) ns test starting write worker threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::923) ns test starting defrag threads
Jan 07 2016 11:28:34 GMT: INFO (as): (as.c::457) initializing services...
Jan 07 2016 11:28:34 GMT: INFO (tsvc): (thr_tsvc.c::819) shared queues: 32 queues with 32 threads each
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2649) Sending 10.240.0.14 as the IP address for receiving heartbeats
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2661) heartbeat socket initialization
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2675) initializing mesh heartbeat socket : 10.240.0.14:3002
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3454) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
Jan 07 2016 11:28:34 GMT: INFO (partition): (partition.c::3432) {test} 4096 partitions: found 0 absent, 4096 stored
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3458) Paxos service ignited: bb90e00f00a0142
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::609) Initialize batch-index-threads to 32
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::635) Created JEMalloc arena #151 for batch normal buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::636) Created JEMalloc arena #152 for batch huge buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (thr_batch.c::347) Initialize batch-threads to 4
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::4147) {test} floor set at 1049 wblocks per device
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3539) listening for other nodes (max 3000 milliseconds) ...
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.6:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.5:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.6:3002 (10.240.0.6:3002) via socket 60 from 10.240.0.14:55702
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.5:3002 (10.240.0.5:3002) via socket 61 from 10.240.0.14:40626
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.23:3002 (10.240.0.23:3002) via socket 62 from 10.240.0.14:42802
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.13:3002 (10.240.0.13:3002) via socket 63 from 10.240.0.14:35384
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90500f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90600f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90500f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90600f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3547) ... other node(s) detected - node will operate in a multi-node cluster
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90500f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90600f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #8 for thr_demarshal()
Jan 07 2016 11:28:37 GMT: INFO (ldt): (thr_nsup.c::1139) LDT supervisor started
Jan 07 2016 11:28:37 GMT: INFO (nsup): (thr_nsup.c::1176) namespace supervisor started
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3516) paxos supervisor thread started
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::308) Service started: socket 0.0.0.0:3000
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90d00f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb91700f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90d00f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb91700f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90d00f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb91700f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]@bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is now principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]@bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is still principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2490) CLUSTER SIZE = 5
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2533) Global state is well formed
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2262) setting replication factors: cluster size 5, paxos single replica limit 1
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2278) {test} replication factor is 5
Jan 07 2016 11:28:38 GMT: INFO (config): (cluster_config.c::421) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (cluster_config.c::380) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::3337) {test} re-balanced, expected migrations - (5789 tx, 6010 rx)
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3355) global partition state: total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3356) partition state after fixing lost partitions (master): total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3357) 0 new partition version tree paths generated
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::375) ALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::803) Node allows migrations. Ignoring duplicate partition sync message.
Jan 07 2016 11:28:38 GMT: WARNING (paxos): (paxos.c::3301) unable to apply partition sync message state
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #18 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #19 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #20 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #21 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #22 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #23 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #24 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #25 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #26 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #27 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #28 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #30 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #29 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #31 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #32 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #33 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #34 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #35 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #36 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #37 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #38 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #39 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #40 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #41 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #42 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #43 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #44 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #45 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #46 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #47 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #48 for thr_demarshal()
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::860) Waiting to spawn demarshal threads ...
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::863) Started 32 Demarshal Threads
Jan 07 2016 11:28:39 GMT: INFO (as): (as.c::494) service ready: soon there will be cake!
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5084)  system memory: free 6590544kb ( 86 percent free ) 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5090)  ClusterSize 5 ::: objects 13593274 ::: sub_objects 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5099)  rec refs 13596175 ::: rec locks 1 ::: trees 0 ::: wr reqs 0 ::: mig tx 2633 ::: mig rx 30
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5104)  replica errs :: null 0 non-null 0 ::: sync copy errs :: master 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5114)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: iq 0 ::: dq 0 : fds - proto (22, 35, 13) : hb (4, 4, 0) : fab (72, 72, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5116)    heartbeat_received: self 0 : foreign 322
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5117)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5129)    tree_counts: nsup 0 scan 0 dup 0 wprocess 0 migrx 30 migtx 2633 ssdr 0 ssdw 0 rw 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5158) {test} disk bytes used 89561376640 : avail pct 71 : cache-read pct 0.00
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5160) {test} memory bytes used 869969536 (index 869969536 : sindex 0) : used pct 23.70
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5171) {test} ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5194) {test} migrations - remaining (5777 tx, 5982 rx), active (1 tx, 2 rx), 0.34% complete
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5203)    partitions: actual 792 sync 3304 desync 0 zombie 0 absent 0
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5385) node id bb90e00f00a0142
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5389) reads 0,0 : writes 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5393) udf reads 0,0 : udf writes 0,0 : udf deletes 0,0 : lua errors 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5396) basic scans 0,0 : aggregation scans 0,0 : udf background scans 0,0 :: active scans 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5400) index (new) batches 0,0 : direct (old) batches 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5404) aggregation queries 0,0 : lookup queries 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5406) proxies 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5415) {test} objects 13593274 : sub-objects 0 : master objects 2625756 : master sub-objects 0 : prole objects 3126 : prole sub-objects 0
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c05441008 with fne: 0x7f7c03c0e108 and fd: 68 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e1b008 with fne: 0x7f7c03c0e108 and fd: 78 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e9d008 with fne: 0x7f7c03c0e108 and fd: 80 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07dda008 with fne: 0x7f7c03c0e108 and fd: 76 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07d99008 with fne: 0x7f7c03c0e108 and fd: 75 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07ede008 with fne: 0x7f7c03c0e108 and fd: 81 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e5c008 with fne: 0x7f7c03c0e108 and fd: 79 (Failed)
Jan 07 2016 11:28:54 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /data/aerospike.dat: used 89561376640, contig-free 220797M (220797 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 2 (0.1/s) defrag-w-tot 0 (0.0/s)
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal_resume:124) unable to resume socket FD -1 on epoll instance FD 115: 9 (Bad file descriptor)
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Community Edition build 3.7.1 os ubuntu12.04
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: found 13 frames
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x5d) [0x48a07a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x352f0) [0x7f7c3c97e2f0]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7c3c97e267]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f7c3c97feca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2a3) [0x516b1a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(thr_demarshal_resume+0x8b) [0x49f473]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /usr/bin/asd(as_end_of_transaction_ok+0x9) [0x4d58f4]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /usr/bin/asd(write_request_destructor+0x132) [0x4c1c8e]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 8: /usr/bin/asd(cf_rchash_free+0x26) [0x541028]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 9: /usr/bin/asd(cf_rchash_reduce+0xb5) [0x541fe9]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 10: /usr/bin/asd(rw_retransmit_fn+0x44) [0x4c0eca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f7c3dbe16aa]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7c3ca4feed]
Jan 07 2016 12:13:37 GMT: INFO (as): (as.c::410) <><><><><><><><><><>  Aerospike Community Edition build 3.7.1  <><><><><><><><><><>
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Aerospike database configuration file.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  user root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  group root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  pidfile /var/run/aerospike/asd.pid
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  service-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  transaction-queues 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  transaction-threads-per-queue 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)         batch-index-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  proto-fd-max 15000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)         batch-max-requests 200000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) logging {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  # Log file must be an absolute path.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  file /var/log/aerospike/aerospike.log {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)      context any info
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) network {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)      #address any
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)      port 3000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  heartbeat {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)                 mode mesh
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)                 mesh-seed-address-port 10.240.0.6 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)                 mesh-seed-address-port 10.240.0.5 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)                 port 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)      interval 150
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)      timeout 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  fabric {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)      port 3001
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  info {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)      port 3003
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) namespace test {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  replication-factor 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  memory-size 3500M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  default-ttl 0 # 30 days, use 0 to never expire/evict.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)         ldt-enabled true
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  storage-engine device {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)           file /data/aerospike.dat
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)           write-block-size 1M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)           filesize 300G
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)         }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3265) system file descriptor limit: 100000, proto-fd-max: 15000
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::119) Node ip: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::327) Heartbeat address for mesh: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3309) Rack Aware mode not enabled
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3312) Node id bb90e00f00a0142
Jan 07 2016 12:13:37 GMT: INFO (namespace): (namespace_cold.c::101) ns test beginning COLD start
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3797) opened file /data/aerospike.dat: usable size 322122547200
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /data/aerospike.dat has 307200 wblocks of size 1048576
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3176) device /data/aerospike.dat: reading device to load index
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3181) In TID 13102: Using arena #150 for loading data for namespace "test"
Jan 07 2016 12:13:39 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 134133 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:41 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 258771 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:43 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 388121 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:45 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 512116 records, 0 subrecords, /data/aerospike.dat 1%
Jan 07 2016 12:13:47 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 641566 records, 0 subrecords, /data/aerospike.dat 1%

1 个答案:

答案 0 :(得分:0)

这已在airospike服务器的3.7.1及更高版本中修复。

关于这个问题和Jira的更多细节:

[AER-4487],[AER-4690] - (聚类/迁移)竞争条件导致心跳fd保存不正确,后来无法移除。

还请看:

https://discuss.aerospike.com/t/aerospike-crash/2327