如何在AWS上安全缩减Elasticsearch集群?

时间:2020-04-23 23:27:02

标签: elasticsearch

我正在使用AWS Elasticsearch集群,目前有9个实例,其中包括1个分片和8个副本。当我尝试按比例缩小群集以使其具有2个实例(1个分片和1个副本)时,出现此错误:// g++ -Wall -Wextra -pedantic -std=c++17 -O3 a.cpp #include <random> #include <memory> #include <variant> #include <chrono> #include <iostream> using chronores = std::nano; static constexpr char const resstr[] = "ns"; namespace helper { template <template <unsigned> typename T, unsigned X, unsigned UB, typename... Args> struct mkvar { using type = typename mkvar<T,X+1,UB,Args...,T<X>>::type; }; template <template <unsigned> typename T, unsigned UB, typename... Args> struct mkvar<T,UB,UB,Args...> { using type = std::variant<Args...,T<UB>>; }; template <template <unsigned> typename T, unsigned LB, unsigned UB> using mkvar_t = typename mkvar<T,LB,UB>::type; template <unsigned X> struct Num { static constexpr unsigned value = X; using inc = Num<X+1>; }; template <typename NumX, typename NumUB, template <unsigned> typename T, bool use_variant> struct ctor_Num { static constexpr auto X = NumX::value; static constexpr auto UB = NumUB::value; template <typename Container> static void run(unsigned x, Container& container) { if (x == X) { if constexpr (use_variant) { container.emplace_back(T<X>()); } else { container.emplace_back(std::make_unique<T<X>>()); } } else { ctor_Num<typename NumX::inc,NumUB,T,use_variant>::run(x,container); } } }; template <typename NumX, template <unsigned> typename T, bool use_variant> struct ctor_Num<typename NumX::inc,NumX,T,use_variant> { template <typename Container> static void run(unsigned, Container&) { } }; template <unsigned X, unsigned UB, template <unsigned> typename T, bool use_variant, typename Container> inline void ctor(unsigned x, Container& container) { return ctor_Num<Num<X>,Num<UB>,T,use_variant>::run(x,container); } struct Time { double& time; std::chrono::time_point<std::chrono::steady_clock> start; Time(double& time) : time(time) { start = std::chrono::steady_clock::now(); } ~Time() { auto const finish = std::chrono::steady_clock::now(); time += std::chrono::duration<double,chronores>(finish-start).count(); } }; } template <unsigned LB, unsigned UB> struct measure { struct A { virtual unsigned f() const noexcept = 0; virtual ~A() noexcept {} }; template <unsigned X> struct B : A { virtual unsigned f() const noexcept override { return X; } }; template <unsigned X> struct C { unsigned f() const noexcept { return X; } }; static void main(std::size_t const N, std::size_t const R = 10, bool warmup = false) { if (!warmup) main(N,1,true); using namespace helper; std::vector<std::unique_ptr<A>> bs; bs.reserve(N); std::vector<mkvar_t<C,LB,UB>> cs; cs.reserve(N); std::uniform_int_distribution<unsigned> distr(LB,UB); double time_ctor_virtual = 0; double time_ctor_variant = 0; double time_call_virtual = 0; double time_call_variant = 0; unsigned volatile sum = 0; std::mt19937 mt(42); mt.discard(100); for (std::size_t r = 0; r < R; ++r) { bs.clear(); cs.clear(); { Time scope(time_ctor_virtual); for (std::size_t i = 0; i < N; ++i) { bs.emplace_back(std::make_unique<B<UB>>()); } } { Time scope(time_ctor_variant); for (std::size_t i = 0; i < N; ++i) { cs.emplace_back(C<UB>()); } } bs.clear(); cs.clear(); for (std::size_t i = 0; i < N; ++i) { auto const rn = distr(mt); // effectively calls bs.emplace_back(std::make_unique<B<rn>>()) ctor<LB,UB,B,false>(rn,bs); // effectively calls cs.emplace_back(C<rn>()) ctor<LB,UB,C,true >(rn,cs); } { Time scope(time_call_variant); for (std::size_t i = 0; i < N; ++i) { sum += std::visit([](auto const& c) { return c.f(); },cs[i]); } } { Time scope(time_call_virtual); for (std::size_t i = 0; i < N; ++i) { sum += bs[i]->f(); } } } (void)sum; if (!warmup) { std::cout << "[" << LB << "," << UB << "] time ctor virtual: " << (time_ctor_virtual/N/R) << " " << resstr << "\n"; std::cout << "[" << LB << "," << UB << "] time ctor variant: " << (time_ctor_variant/N/R) << " " << resstr << "\n"; std::cout << "[" << LB << "," << UB << "] time call virtual: " << (time_call_virtual/N/R) << " " << resstr << "\n"; std::cout << "[" << LB << "," << UB << "] time call variant: " << (time_call_variant/N/R) << " " << resstr << "\n"; } } }; int main() { static constexpr std::size_t N = 400000; measure<0,0>::main(N); std::cout << "\n"; measure<0,1>::main(N); std::cout << "\n"; measure<0,7>::main(N); std::cout << "\n"; measure<0,31>::main(N); std::cout << "\n"; measure<0,127>::main(N); std::cout << "\n"; }

我希望它从群集中删除多余的副本。为什么它不允许我这样做?是否尝试将多余的副本合并为一个?解决此问题的正确方法是什么?

2 个答案:

答案 0 :(得分:1)

缩小Elasticsearch集群

Elasticsearch必须能够抵抗单个节点的故障。通过在quorum个节点接受集群状态更新后认为成功,集群可以实现这种弹性。仲裁是群集中符合主机资格的节点的精心选择的子集。

必须仔细选择仲裁人数,以使集群无法选择两个独立的主机,这些主机的决策不一致,最终导致数据丢失。知道more..

缩小之前的准备工作

  • 如果出现问题,请备份群集以进行恢复。
  • 根据
  • 检查主节点配置的最小主节点
  • 停止对集群的所有写操作,因为在缩减规模后进行故障转移将是不安全的,但如果一切正常,则不是强制性的。
  • 确保您不会通过使群集的磁盘空间和内存过小来使群集超载,否则群集将仅在Low disk watermark下变为只读。
  • 将索引复制因子降低到1,以节省空间并在缩放过程中加快分片重定位,因为需要创建和移动的分片较少。而且,这可以节省重复数据中的大量空间。

    curl -X PUT "localhost:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d'
    {
        "index" : {
           "number_of_replicas" : 1
        }
    }
    '
    
  • 在开始缩减之前,请先优雅地重新平衡集群。

  • 集群必须保持绿色并保持健康,请检查shardsstatus

    健康

    curl -X GET "localhost:9200/_cluster/health?pretty"
    

    预期输出

    {
      "cluster_name" : "\"es-data-cluster\"",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 1,
      "number_of_data_nodes" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }
    

    碎片

    curl -X GET "localhost:9200/_cat/shards"
    

    预期输出

    twitter 2 p STARTED    0   0b 172.18.0.2 es-node
    twitter 1 p STARTED    0   0b 172.18.0.2 es-node
    twitter 0 p STARTED    0 230b 172.18.0.2 es-node
    

当群集状态为green且所有分片均为STARTED时,最好按比例缩小。

缩小比例的步骤

  • 删除一个数据节点-群集将进入黄色状态。 现在观察

    • 集群日志
    • 检查STARTEDUNASSIGNED分片

    如果日志中显示Marking shards as stale,则表示该分片不再可用,将被删除。然后,elasticsearch的内置功能开始重新平衡碎片。

    curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"
    

    此命令将详细说明集群中的分片分配。

  • 等待绿色-则集群已复制丢失的碎片。

集群运行状况为red,因此至少有一个未分配的主分片。您需要关注未分配的群集。

答案 1 :(得分:0)

您的设置不会更改,因为您按比例缩小了集群。由于您已指定1个主碎片和8个副本,因此elasticsearch会尝试将这些副本移动到其他节点。您需要先更新settings(1个主分片和1个副本),然后按比例缩小集群。

PUT /INDEX_NAME_HERE/_settings
{
    "index" : {
        "number_of_replicas" : 1
    }
}