感谢您抽出宝贵的时间阅读问题。
我想寻求一些建议,以部署定制的,经过训练的pytorch Bert模型,该模型在gpu上运行以进行推理(无需进行训练,模型已保存为.pt文件)。
我在AWS上搜索了不同的文档,并发现如下链接:
https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own
https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/pytorch_extending_our_containers
https://github.com/aws-samples/amazon-sagemaker-bert-pytorch
首先,我不知道是否要创建每日自动批处理推断的容器。因为我将最后一个链接放在了那里,所以他们甚至都没有创建容器。
如果是–我已尝试按照教程创建具有以下结构的容器文件:
容器
Dockerfile
build_and_push.sh
分类
但是,目前我对几件事感到困惑:
我当前的Dockerfile看起来像这样:
# https://hub.docker.com/r/huggingface/transformers-pytorch-gpu/dockerfile
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
# FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.0-gpu-py36-cu101-ubuntu16.04
# FROM 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-inference-pytorch:1.5.0f-gpu-py3
LABEL maintainer="bqge@amazon.com"
LABEL project="product-review-models"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
wget \
nginx \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
# Here we get all python packages.
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
mkl \
torch==1.5.0 \
transformers==2.11.0 \
path \
sklearn \
xlrd \
spacy==2.1.0 \
flask \
gevent \
gunicorn \
pandas \
ipython \
spacy==2.1.0 \
neuralcoref==4.0 && \
python3 -m spacy download en_core_web_md
# RUN rm -f /usr/bin/python && ln -s /usr/bin/python /usr/bin/python3
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
# Set up the program in the image
# /opt/ml and all subdirectories are utilized by SageMaker, we use the /code subdirectory to store our user code.
COPY /review-classification /opt/program
WORKDIR /opt/program
# this environment variable is used by the SageMaker PyTorch container to determine our user code directory.
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/program
# this environment variable is used by the SageMaker PyTorch container to determine our program entry point
# for training and serving.
ENV SAGEMAKER_PROGRAM review-classification/serve
ENTRYPOINT ["/usr/bin/python3", "/opt/program/serve"]
我当前的build_and_push.sh看起来像这样
#!/usr/bin/env bash
# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.
# The name of our algorithm
algorithm_name=product-review-repo
# parameters
PY_VERSION="py36"
account=$(aws sts get-caller-identity --query Account --output text)
if [ $? -ne 0 ]
then
exit 255
fi
cd SageMaker/container
chmod +x review-classification/serve
# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}
TAG="gpu-${PY_VERSION}"
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:${TAG}"
# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names ${algorithm_name} || aws ecr create-repository --repository-name ${algorithm_name}
if [ $? -ne 0 ]
then
aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi
echo "---> repository done.."
# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}
echo "---> logged in to account ecr.."
echo "Building image with arch=gpu, region=${region}"
# Build the docker image locally with the image name and then push it to ECR
# with the full name.
docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}
docker push ${fullname}
以下是我的服务代码:
#!/usr/bin/env python
# This file implements the scoring service shell. You don't necessarily need to modify it for various
# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until
# gunicorn exits.
#
# The flask server is specified to be the app object in wsgi.py
#
# We set the following parameters:
#
# Parameter Environment Variable Default Value
# --------- -------------------- -------------
# number of workers MODEL_SERVER_WORKERS the number of CPU cores
# timeout MODEL_SERVER_TIMEOUT 60 seconds
from __future__ import print_function
import multiprocessing
import os
import signal
import subprocess
import sys
cpu_count = multiprocessing.cpu_count()
model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)
model_server_workers = int(os.environ.get('MODEL_SERVER_WORKERS', cpu_count))
def sigterm_handler(nginx_pid, gunicorn_pid):
try:
os.kill(nginx_pid, signal.SIGQUIT)
except OSError:
pass
try:
os.kill(gunicorn_pid, signal.SIGTERM)
except OSError:
pass
sys.exit(0)
def start_server():
print('Starting the inference server with {} workers.'.format(model_server_workers))
# link the log streams to stdout/err so they will be logged to the container logs
subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])
nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])
gunicorn = subprocess.Popen(['gunicorn',
'--timeout', str(model_server_timeout),
'-k', 'gevent',
'-b', 'unix:/tmp/gunicorn.sock',
'-w', str(model_server_workers),
'wsgi:app'])
signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))
# If either subprocess exits, so do we.
pids = set([nginx.pid, gunicorn.pid])
while True:
pid, _ = os.wait()
if pid in pids:
break
sigterm_handler(nginx.pid, gunicorn.pid)
print('Inference server exiting')
# The main routine just invokes the start function.
if __name__ == '__main__':
start_server()
非常感谢您!