部署在gpu上运行以进行推理的自定义,训练有素的pytorch Bert模型

时间:2020-10-19 03:19:41

标签: docker pytorch containers amazon-sagemaker bert-language-model

感谢您抽出宝贵的时间阅读问题。

我想寻求一些建议,以部署定制的,经过训练的pytorch Bert模型,该模型在gpu上运行以进行推理(无需进行训练,模型已保存为.pt文件)。
我在AWS上搜索了不同的文档,并发现如下链接:
https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own
https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/pytorch_extending_our_containers
https://github.com/aws-samples/amazon-sagemaker-bert-pytorch

首先,我不知道是否要创建每日自动批处理推断的容器。因为我将最后一个链接放在了那里,所以他们甚至都没有创建容器。

如果是–我已尝试按照教程创建具有以下结构的容器文件:

容器

Dockerfile
build_and_push.sh
分类

  • predictor.py
  • 服务
  • wsgi.py
  • nginx.conf
  • 其他支持predictor.py的python脚本
  • 模型(包含保存.pt文件的文件夹)

但是,目前我对几件事感到困惑:

  1. 在线上有很多Dockerfile示例,一些使用python,一些使用ubuntu,还有一些使用来自AWS账户的pytorch_training。我从pytorch-gpu的拥抱面中选择了一个:NVIDIA / CUDA:10.1-cudnn7-runtime-ubuntu18.04 问题:我用作基本图像有关系吗?我是否还需要编写如下内容: ENV SAGEMAKER_SUBMIT_DIRECTORY / opt / program ENV SAGEMAKER_PROGRAM评论分类/服务 (假设这些仅在使用pytorch_training图片时有效?
  2. 我使用build_and_push.sh文件在我的ECR上创建了一个图像,但是如何知道它是否正确设置?
  3. 服务代码重要吗?现在,我从第一个链接之一获得了服务代码。它说您通常不需要修改服务中的任何内容。但是,我可以说这是CPU的设置参数。我是否需要针对GPU进行修改。如果可以,怎么办?
  4. 下一步应该做什么?

我当前的Dockerfile看起来像这样:

# https://hub.docker.com/r/huggingface/transformers-pytorch-gpu/dockerfile
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
# FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.0-gpu-py36-cu101-ubuntu16.04
# FROM 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-inference-pytorch:1.5.0f-gpu-py3

LABEL maintainer="bqge@amazon.com"
LABEL project="product-review-models"

RUN apt update && \
    apt install -y bash \
                   build-essential \
                   git \
                   curl \
                   wget \
                   nginx \
                   ca-certificates \
                   python3 \
                   python3-pip && \
    rm -rf /var/lib/apt/lists

# Here we get all python packages.
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
    python3 -m pip install --no-cache-dir \
        mkl \
        torch==1.5.0 \
        transformers==2.11.0 \
        path \
        sklearn \
        xlrd \
        spacy==2.1.0 \
        flask \
        gevent \
        gunicorn \
        pandas \
        ipython \
        spacy==2.1.0 \
        neuralcoref==4.0 && \
    python3 -m spacy download en_core_web_md

# RUN rm -f /usr/bin/python && ln -s /usr/bin/python /usr/bin/python3
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

# Set up the program in the image
# /opt/ml and all subdirectories are utilized by SageMaker, we use the /code subdirectory to store our user code.
COPY /review-classification /opt/program
WORKDIR /opt/program

# this environment variable is used by the SageMaker PyTorch container to determine our user code directory.
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/program

# this environment variable is used by the SageMaker PyTorch container to determine our program entry point
# for training and serving.
ENV SAGEMAKER_PROGRAM review-classification/serve

ENTRYPOINT ["/usr/bin/python3", "/opt/program/serve"]

我当前的build_and_push.sh看起来像这样

#!/usr/bin/env bash

# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.

# The name of our algorithm
algorithm_name=product-review-repo

# parameters
PY_VERSION="py36"


account=$(aws sts get-caller-identity --query Account --output text)

if [ $? -ne 0 ]
then
    exit 255
fi

cd SageMaker/container

chmod +x review-classification/serve

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

TAG="gpu-${PY_VERSION}"

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:${TAG}"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names ${algorithm_name} || aws ecr create-repository --repository-name ${algorithm_name}

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

echo "---> repository done.."
# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

echo "---> logged in to account ecr.."

echo "Building image with arch=gpu, region=${region}"


# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

以下是我的服务代码:

#!/usr/bin/env python

# This file implements the scoring service shell. You don't necessarily need to modify it for various
# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until
# gunicorn exits.
#
# The flask server is specified to be the app object in wsgi.py
#
# We set the following parameters:
#
# Parameter                Environment Variable              Default Value
# ---------                --------------------              -------------
# number of workers        MODEL_SERVER_WORKERS              the number of CPU cores
# timeout                  MODEL_SERVER_TIMEOUT              60 seconds

from __future__ import print_function
import multiprocessing
import os
import signal
import subprocess
import sys

cpu_count = multiprocessing.cpu_count()

model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)
model_server_workers = int(os.environ.get('MODEL_SERVER_WORKERS', cpu_count))

def sigterm_handler(nginx_pid, gunicorn_pid):
    try:
        os.kill(nginx_pid, signal.SIGQUIT)
    except OSError:
        pass
    try:
        os.kill(gunicorn_pid, signal.SIGTERM)
    except OSError:
        pass

    sys.exit(0)

def start_server():
    print('Starting the inference server with {} workers.'.format(model_server_workers))


    # link the log streams to stdout/err so they will be logged to the container logs
    subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
    subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])

    nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])
    gunicorn = subprocess.Popen(['gunicorn',
                                 '--timeout', str(model_server_timeout),
                                 '-k', 'gevent',
                                 '-b', 'unix:/tmp/gunicorn.sock',
                                 '-w', str(model_server_workers),
                                 'wsgi:app'])

    signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))

    # If either subprocess exits, so do we.
    pids = set([nginx.pid, gunicorn.pid])
    while True:
        pid, _ = os.wait()
        if pid in pids:
            break

    sigterm_handler(nginx.pid, gunicorn.pid)
    print('Inference server exiting')

# The main routine just invokes the start function.

if __name__ == '__main__':
    start_server()

非常感谢您!

0 个答案:

没有答案