Question

案例：某些键的写入失败。并且在服务器日志中出现以下错误时重试失败：

2016年8月30日07:14:58 GMT：警告（drv_ssd）:( drv_ssd.c：1225）阅读：坏块魔法偏移1704448 2016年8月30日07:14:58 GMT：警告（drv_ssd）:( drv_ssd.c：1283）get_key：失败 as_storage_record_read_ssd（）2016年8月30日07:14:58 GMT：警告（rw）：（thr_rw.c：3440）{userdata} write_local：无法获取存储密钥：0x0ac772018687b572e1a9be79ad0c168dccbee955

以下是所有3个节点上的配置文件：

service {
  user root
  group root
  paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
  pidfile /var/run/aerospike/asd.pid

  ## SET TO NUMBER OF CORES ##
  service-threads 8
  transaction-queues 8
  scan-threads 8
  ###########################

  ## DONT CHANGE ##
  transaction-threads-per-queue 3
  proto-fd-idle-ms 600000
  proto-fd-max 100000
  batch-max-requests 10000
  migrate-threads 2
  replication-fire-and-forget true
  ##########################
}

logging {
  file /var/log/aerospike/aerospike.log {
    context any info
  }
}

network {
  service {
    address any
    port 3000
  }

  heartbeat {
    mode mesh
    port 3002

    mesh-seed-address-port 10.0.23.46 3002
    mesh-seed-address-port 10.0.23.7 3002
    mesh-seed-address-port 10.0.23.52 3002

    interval 150
    timeout 20
  }

  fabric {
    port 3001
  }

  info {
    port 3003
  }
}

namespace userdata {
  replication-factor 2
  #### CHANGE FOR INSTANCE ###
  memory-size 30G
  ############################
  default-ttl 0 # 30 days, use 0 to never expire/evict.
  storage-engine device {
    ## COLD START AND NO SHADOW DEVICE ##
    cold-start-empty true
    device /dev/xvdf
    #####################################
    ### 1MB FOR INSTANCE STORE ###
    write-block-size 1024K
    #############################
  }
# storage-engine memory
}

namespace user_config_data {
        replication-factor 2
        memory-size 5G
        default-ttl 0
        storage-engine device {
                cold-start-empty true
                device /dev/xvdf
                write-block-size 1024K
        }
}

目前，我们在命名空间user_config_data

中没有任何数据

注意：几天后，所有3个节点同时重新启动了aerospike，导致所有数据丢失。

Answer 1

您对两个命名空间使用相同的设备。那是错的。这有太多错误配置的元素和出了什么问题。我强烈建议在http://discuss.aerospike.com/

的aerospike论坛上讨论这个问题

Answer 2

几天后，所有3个节点同时重新启动了aerospike，导致所有数据丢失。

您可以通过再次关闭节点并删除cold-start-empty参数然后重新启动它们来恢复数据。之后你会再次设置cold-start-empty并处理返回的删除。

Aerospike写入因服务器错误1而失败

2 个答案: