Question

我们有一个GKE集群：

版本为1.6.13-gke.0的主节点
版本为1.6.11-gke.0的2个节点池

我们已激活堆栈驱动程序监控和日志记录。

在2018-01-22，掌握由Google升级到版本1.7.11-gke.1。

升级后，我们遇到了很多错误：

import sys

# translated to Python from http://www.bluetulip.org/2014/programs/primitive.js
# (some rights may remain with the author of the above javascript code)

def isNotPrime(possible):
    # We only test this here to protect people who copy and paste
    # the code without reading the first sentence of the answer.
    # In an application where you know the numbers are prime you
    # will remove this function (and the call). If you need to
    # test for primality, look for a more efficient algorithm, see
    # for example Joseph F's answer on this page.
    i = 2
    while i*i <= possible:
        if (possible % i) == 0:
            return True
        i = i + 1
    return False

def primRoots(theNum):
    if isNotPrime(theNum):
        raise ValueError("Sorry, the number must be prime.")
    o = 1
    roots = []
    r = 2
    while r < theNum:
        k = pow(r, o, theNum)
        while (k > 1):
            o = o + 1
            k = (k * r) % theNum
        if o == (theNum - 1):
            roots.append(r)
        o = 1
        r = r + 1
    return roots

print(primRoots(int(sys.argv[1])))

这些消息每天充斥我们的日志〜25Gb的日志，并由名为fluentd-gcp-v2.0.9的DaemonSet管理的pod生成。

我们发现它是固定在1.8和bug上的backported to 1.7.12。

我的问题是：

我们应该将主人升级到版本1.7.12吗？这样做是否安全？ OR
升级前还有其他测试方法吗？

提前致谢。

Answer 1

首先，问题2的答案。

作为替代方案，我们可以：

过滤流利，忽略来自fluentd-gcp pod的日志或
停用Stackdriver监控和记录

回答问题1：

我们在测试环境中升级到1.7.12。这个过程耗时3分钟。在这段时间内，我们无法编辑我们的集群，也无法使用kubectl访问它（正如预期的那样）。

升级后，我们删除了所有名为 fluentd-gcp - * 的广告连播，洪水立即停止：

for pod in $(kubectl get pods -nkube-system | grep fluentd-gcp | awk '{print $1}'); do \
    kubectl -nkube-system delete pod $pod; \
    sleep 20; \
done;

主站从1.6.13-gke.0升级到1.7.11-gke.1后的日志洪水

1 个答案: