sagemaker批处理转换因上游过早关闭的连接而中断,同时读取上游

时间:2018-11-04 02:00:37

标签: python nginx flask gunicorn amazon-sagemaker

我一直在尝试通过其批量转换服务在AWS sagemaker上使用容器化的机器学习模型,该服务将整个数据集分解为较小的数据集,以便从机器学习模型进行推理。

该容器具有一个flask服务,该服务在后台运行带有gunicorn和nginx的ML模型。在执行批处理转换时,我收到502错误的网关错误,并在日志上出现以下错误(当我使用输入的50k数据集运行相同的容器时,它通过c5.xlarge实例传递,但是当我在80k以下的相同环境下运行时失败)

*4 upstream prematurely closed connection while reading response header from 
upstream, client: IP, server: , request: "POST /invocations 
HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/invocations", host: 
"IP:8080"

"POST /invocations HTTP/1.1" 502 182 "-" "Apache-HttpClient/4.5.x (Java/1.8.0_172)"

Nginx配置

worker_processes 1;
daemon off; # Prevent forking
pid  /tmp/nginx.pid;
error_log /var/log/nginx/error.log;
events { defaults }
http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    access_log /var/log/nginx/access.log combined;

    upstream gunicorn {
        server unix:/tmp/gunicorn.sock;
    }

    server {
       listen 8080 deferred;
       client_max_body_size 5m;

       keepalive_timeout 10000;

       location ~ ^/(ping|invocations) {
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_set_header Host $http_host;
           proxy_redirect off;
           proxy_pass http://gunicorn;
      }

     location / {
       return 404 "{}";
     }
  } 
}

和gunicorn配置:

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/container/decision_trees/serve

我对nginx和gunicorn还是陌生的,并且已经阅读了其他大部分内容,因此上游端过早关闭了连接,同时读取了响应错误。我已经尝试过诸如增加客户端主体大小之类的方法,但是仍然遇到相同的错误。对此的一些帮助将非常有帮助。

1 个答案:

答案 0 :(得分:1)

这看起来像是一个花花公子的工人超时。您可以根据模型为推理请求提供服务的时间来调整两种超时设置:

  1. 枪手工人的超时时间,可以在此处进行调整:https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/container/decision_trees/serve#L25

  2. nginx proxy_read_timeout 设置,可以将其添加到以下位置的nginx.conf中:https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/container/decision_trees/nginx.conf#L21-L37

如果您需要对特定转换作业的支持,请访问AWS论坛:https://forums.aws.amazon.com/forum.jspa?forumID=285&start=0