Azure Databricks群集初始化脚本-安装python wheel

时间:2020-04-07 10:02:03

标签: python bash azure databricks azure-databricks

我有一个python脚本,该脚本在databricks中安装一个存储帐户,然后从该存储帐户安装一个转轮。我正在尝试将其作为群集初始化脚本运行,但是它一直失败。我的脚本的格式为:

#/databricks/python/bin/python
mount_point = "/mnt/...."
configs = {....}
source = "...."
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(source = source, mount_point = mount_point, extra_configs = configs)
dbutils.library.install("dbfs:/mnt/.....")
dbutils.library.restartPython()

当我直接在笔记本中直接运行它时它可以工作,但是如果我保存到名为dbfs:/databricks/init_scripts/datalakes/init.py的文件并将其用作群集初始化脚本,则群集无法启动,并且错误消息指出该初始化脚本具有非零退出状态。我检查了日志,似乎它以bash而不是python的形式运行:

bash: line 1: mount_point: command not found

我尝试从包含以下一行的bash脚本init.bash运行python脚本:

/databricks/python/bin/python "dbfs:/databricks/init_scripts/datalakes/init.py"

然后使用init.bash的群集无法启动,日志显示找不到python文件:

/databricks/python/bin/python: can't open file 'dbfs:/databricks/init_scripts/datalakes/init.py': [Errno 2] No such file or directory

有人可以告诉我如何使它工作吗?

相关问题:Azure Databricks cluster init script - Install wheel from mounted storage

1 个答案:

答案 0 :(得分:1)

我所采用的解决方案是运行一个笔记本电脑,该笔记本电脑会安装存储设备并创建一个bash初始化脚本,该脚本只会安装滚轮。像这样:

mount_point = "/mnt/...."
configs = {....}
source = "...."
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(source = source, mount_point = mount_point, extra_configs = configs)

dbutils.fs.put("dbfs:/databricks/init_scripts/datalakes/init.bash",""" 
        /databricks/python/bin/pip install "../../../dbfs/mnt/package-source/parser-3.0-py3-none-any.whl"""", True)"