Question

我正在使用Kubernetes（GKE）和在GCP上托管的GPU节点（K80）部署Tensorflow服务映像（tensorflow / serving：latest-gpu）。

命令：

command: ["tensorflow_model_server"] args: ["--port=8500", "--rest_api_port=8501", "--enable_batching", "--batching_parameters_file=/etc/config/batching_parameters","--model_config_file=/etc/config/model_config"]

批处理参数：

maxBatchSize: 4096 batchTimeoutMicros: 25000 maxEnqueuedBatches: 16 numBatchThreads: 16

我使用--model_config_file从GCS存储桶中加载版本模型。 Tensorflow服务会拉出每个新版本的模型并加载它，完成后他会卸载旧模型（但看起来他将其保留在内存中）

当我在主机上的最大可用资源下使用限制/请求时，该Pod完成 OOMKilled ，因为随后允许使用最大内存。但是，当我使用限制/请求匹配主机上的最大可用资源（专用）时，似乎刷新了内存以遵守该最大值。

您知道我们是否可以将最大内存设置为tensorflow或告诉他使用cgroup内存限制（由docker / kubernetes使用）？我们可以刷新旧版本模型以释放内存吗？此外，每次执行请求时，它都会增加内存，而从不释放它。你有什么主意吗？

节点信息：
7个vCPU
30 Gb RAM
1个GPU K80

型号：〜8Gb

限制/请求内存： 20Gb或30Gb-> OOM在加载多个版本模型后被杀死

Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

无限制/请求->由于消耗大量内存，Kubernetes退出了Tensorflow。

Status:             Failed
Reason:             Evicted
Message:            The node was low on resource: memory. Container tensorserving was using 24861136Ki, which exceeds its request of 0.

谢谢

致谢
文斯

Answer 1

作为解决方法，我选择使用其他内存分配器（默认情况下为malloc）：tcmalloc（谷歌内存分配实现），它解决了我的问题，而没有性能问题。

（这是一个难看的部署文件，但用于简化可视化）。
Kubernetes部署张量流服务：

spec:
  containers:
    - name: tensorserving
      image: tensorflow/serving:1.14.0-gpu"
      command: [""]
      args:
        - "sh"
        - "-c"
        - "apt-get update && apt-get install google-perftools -y && LD_PRELOAD=/usr/lib/libtcmalloc.so.4 tensorflow_model_server --port=8500 --rest_api_port=8501 --monitoring_config_file=/etc/config/monitoring_config --enable_batching --batching_parameters_file=/etc/config/batching_parameters --model_config_file=/etc/config/model_config"

Tensorflow使用Kubernetes为OOM杀死或驱逐Pod服务

1 个答案: