我正在尝试通过 Azure Devops 管道将文件复制到 Azure Databricks DBFS。以下是我正在使用的 yml 文件的片段:
stages:
- stage: MYBuild
displayName: "My Build"
jobs:
- job: BuildwhlAndRunPytest
pool:
vmImage: 'ubuntu-16.04'
steps:
- task: UsePythonVersion@0
displayName: 'Use Python 3.7'
inputs:
versionSpec: '3.7'
addToPath: true
architecture: 'x64'
- script: |
pip install pytest requests setuptools wheel pytest-cov
pip install -U databricks-connect==7.3.*
displayName: 'Load Python Dependencies'
- checkout: self
persistCredentials: true
clean: true
- script: |
echo "y
$(databricks-host)
$(databricks-token)
$(databricks-cluster)
$(databricks-org-id)
8787" | databricks-connect configure
databricks-connect test
env:
databricks-token: $(databricks-token)
displayName: 'Configure DBConnect'
- script: |
databricks fs cp test-proj/pyspark-lib/configs/config.ini dbfs:/configs/test-proj/config.ini
在调用 databricks fs cp 命令的阶段出现以下错误:
/home/vsts/work/_temp/2278f7d5-1d96-4c4e-a501-77c07419773b.sh: line 7: databricks: command not found
但是,当我运行 databricks-connect test
时,它能够成功执行命令。如果我在某处遗漏了一些步骤,请帮忙。
答案 0 :(得分:1)
databricks
命令位于 databricks-cli
包中,而不是 databricks-connect
中,因此您需要更改 pip install
命令。
此外,对于 databricks
命令,您只需设置环境变量 DATABRICKS_HOST
和 DATABRICKS_TOKEN
即可,如下所示:
- script: |
pip install pytest requests setuptools wheel
pip install -U databricks-cli
displayName: 'Load Python Dependencies'
- script: |
databricks fs cp ... dbfs:/...
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
displayName: 'Copy artifacts'
附言这是关于如何在 Databricks + notebooks 上执行 CI/CD 的 example。您可能也对cicd-templates project感兴趣。