Question

我有一个sklearn k均值模型。我正在训练模型并将其保存在pickle文件中，以便以后可以使用azure ml库进行部署。我正在训练的模型使用名为 MultiColumnLabelEncoder 的自定义功能编码器。管道模型定义如下：

# Pipeline
kmeans = KMeans(n_clusters=3, random_state=0)
pipe = Pipeline([
("encoder", MultiColumnLabelEncoder()),
('k-means', kmeans),
])
#Training the pipeline
model = pipe.fit(visitors_df)
prediction = model.predict(visitors_df)
#save the model in pickle/joblib format
filename = 'k_means_model.pkl'
joblib.dump(model, filename)

模型保存工作正常。部署步骤与此链接中的步骤相同：

https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-azureml/deploy-to-cloud/model-register-and-deploy.ipynb

但是，部署始终会因以下错误而失败：

  File "/var/azureml-server/create_app.py", line 3, in <module>
    from app import main
  File "/var/azureml-server/app.py", line 27, in <module>
    import main as user_main
  File "/var/azureml-app/main.py", line 19, in <module>
    driver_module_spec.loader.exec_module(driver_module)
  File "/structure/azureml-app/score.py", line 22, in <module>
    importlib.import_module("multilabelencoder")
  File "/azureml-envs/azureml_b707e8c15a41fd316cf6c660941cf3d5/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'multilabelencoder'

我了解到pickle / joblib在解开自定义函数MultiLabelEncoder时会遇到一些问题。这就是为什么我在单独的python脚本（我也执行过）中定义了此类的原因。我在培训python脚本，部署脚本和评分python文件（score.py）中调用了此自定义函数。在score.py文件中的导入不成功。所以我的问题是如何将自定义python模块导入到Azure ml部署环境中？

谢谢。

编辑：这是我的.yml文件

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  - multilabelencoder==1.0.4
  - scikit-learn
  - azureml-defaults==1.0.74.*
  - pandas
channels:
- conda-forge

Answer 1

实际上，解决方案是将我的自定义类 MultiColumnLabelEncoder 作为pip包导入（您可以通过pip install multilllabelencoder == 1.0.5找到它）。然后，我将pip包传递到.yml文件或azure ml环境的InferenceConfig中。在score.py文件中，我按如下所示导入了该类：

from multilabelencoder import multilabelencoder
def init():
    global model

    # Call the custom encoder to be used dfor unpickling the model
    encoder = multilabelencoder.MultiColumnLabelEncoder() 
    # Get the path where the deployed model can be found.
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'k_means_model_45.pkl')
    model = joblib.load(model_path)

然后，部署成功。更重要的一件事是，我必须在培训管道中使用与此处相同的pip包（multilabelencoder）：

from multilabelencoder import multilabelencoder 
pipe = Pipeline([
    ("encoder", multilabelencoder.MultiColumnLabelEncoder(columns)),
    ('k-means', kmeans),
])
#Training the pipeline
trainedModel = pipe.fit(df)

Answer 2

我正面临相同的问题，试图部署一个依赖于我自己的某些脚本的模型并收到错误消息：

 ModuleNotFoundError: No module named 'my-own-module-name'

在MS documentation中找到了这种“专用车轮文件”解决方案，并且可以使用。与上面的解决方案不同的是，现在我不需要将脚本发布为pip。我认为许多人可能会遇到相同的情况，由于某种原因，您可能无法或不希望发布脚本。而是将您自己的wheel文件保存在自己的Blob存储下。

按照文档进行操作之后，我执行了以下步骤，并且对我有用。现在，我可以部署在自己的脚本中具有依赖关系的模型。

1）将模型所依赖的自己的脚本打包到wheel文件中，并将wheel文件保存在本地。

"your_path/your-wheel-file-name.whl"

2）请遵循MS documentation中“专用车轮文件”解决方案中的说明。以下是对我有用的代码。

from azureml.core.conda_dependencies import CondaDependencies 
from azureml.core.environment import Environment

# During environment creation the service replaces the URL by secure SAS URL, so your wheel file is kept private and secure
whl_url = Environment.add_private_pip_wheel(workspace=ws,file_path = "your_pathpath/your-wheel-file-name.whl")
myenv = Environment(name="myenv")

myenv = CondaDependencies()
myenv.add_pip_package("scikit-learn==0.22.1")
myenv.add_pip_package("azureml-defaults")
myenv.add_pip_package(whl_url)

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

我的环境文件现在看起来像：

name: project_environment
dependencies:
  # The python interpreter version.

  # Currently Azure ML only supports 3.5.2 and later.

- python=3.6.2

- pip:
  - scikit-learn==0.22.1
  - azureml-defaults
  - https://myworkspaceid.blob.core/azureml/Environment/azureml-private-packages/my-wheel-file-name.whl
channels:
- conda-forge

我是Azure ml的新手。通过与社区进行交流来学习。该解决方案对我来说效果很好，希望对您有所帮助。

在azure ml部署环境中导入自定义python模块

2 个答案: