如何将笔记本从本地计算机导入Azure Databricks门户?

时间:2017-12-08 10:25:50

标签: azure curl databricks spark-notebook

如何在Azure Databricks中从本地导入笔记本?

我在本地计算机上有DBC格式的示例笔记本,我需要通过Notebook Rest API导入。

curl -n -H "Content-Type: application/json" -X POST -d @- https://YOUR_DOMAIN/api/2.0/workspace/import <<JSON
{
  "path": "/Users/user@example.com/new-notebook",
  "format": "SOURCE",
  "language": "SCALA",
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg==",
  "overwrite": "false"
}
JSON

请参阅此doc

它们作为目标文件路径提供,但未提及源文件路径,而是作为内容提供。但是如何将源文件添加到导入笔记本?

2 个答案:

答案 0 :(得分:3)

If you have a DBC file then the format needs to be DBC and language is ignored.

Also, the content property needs to be the DBC file bytes Base64 encoded, per the docs:

The content parameter contains base64 encoded notebook content

If using bash you could simply do base64 notebook.dbc

答案 1 :(得分:0)

忽略源文件路径的原因是因为您应该将该文件转换为base64并将该字符串放入内容中。因此,路径变得无关紧要。

如果您不想这样做并且不介意使用curl,文档还说您也可以像这样进行管理:

curl -n -F path=/Users/user@example.com/project/ScalaExampleNotebook -F language=SCALA \
  -F content=@example.scala \
  https://<databricks-instance>/api/2.0/workspace/import

否则,如果您碰巧正在寻找如何导入目录...我花了几个小时来寻找自己。它使用databricks-cli库(在Python中)。

$ pip install databricks-cli,然后

from databricks_cli.workspace.api import WorkspaceApi
from databricks_cli.sdk.api_client import ApiClient


client = ApiClient(
    host='https://your.databricks-url.net',
    token=api_key
)
workspace_api = WorkspaceApi(client)
workspace_api.import_workspace_dir(
    source_path="/your/dir/here/MyProject",
    target_path="/Users/user@example.com/MyProject",
    overwrite=True,
    exclude_hidden_files=True
)