如何避免在Papermill中将Keyerror命名为“ kernelspec”?

时间:2019-05-07 13:54:25

标签: python python-3.x jupyter-notebook jupyter papermill

我正在从气流(docker)的角度运行纸厂命令。该脚本存储在S3上,我使用Papermill的Python客户端运行它。最终导致一个根本无法理解的错误:

Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/ipython_genutils/ipstruct.py", line 132, in __getattr__
result = self[key]
KeyError: 'kernelspec'

我试图调查文档,但徒劳无功。

我正在使用的代码是用来运行papermill命令的:

import time
from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from mypackage.datastore import db
from mypackage.workflow.transform.jupyter_notebook import run_jupyter_notebook


dag_id = "jupyter-test-dag"
default_args = {
    'owner': "aviral",
    'depends_on_past': False,
    'start_date': "2019-02-28T00:00:00",
    'email': "aviral@some_org.com",
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=5),
    'provide_context': True
}

dag = DAG(
    dag_id,
    catchup=False,
    default_args=default_args,
    schedule_interval=None,
    max_active_runs=1
)


def print_context(ds, **kwargs):
    print(kwargs)
    print(ds)
    return 'Whatever you return gets printed in the logs'


def run_python_jupyter(**kwargs):
    run_jupyter_notebook(
        script_location=kwargs["script_location"]
    )


create_job_task = PythonOperator(
    task_id="create_job",
    python_callable=run_python_jupyter,
    dag=dag,
    op_kwargs={
            "script_location": "s3://some_bucket/python3_file_write.ipynb"
    }
)

globals()[dag_id] = dag

函数run_jupyter_notebook是:

def run_jupyter_notebook(**kwargs):
    """Runs Jupyter notebook"""
    script_location = kwargs.get('script_location', '')
    if not script_location:
        raise ValueError(
            "Script location was not provided."
        )
    pm.execute_notebook(script_location, script_location.split(
        '.ipynb')[0] + "_output" + ".ipynb")

我希望代码能够正常运行,因为我也在本地运行了此代码(不使用s3路径,而是使用本地文件系统路径)

1 个答案:

答案 0 :(得分:0)

Jupyter将元数据添加到您的笔记本中。您的错误与以下事实有关:缺少关键kernelspec下的某些元数据。

笔记本元数据中 kernelspec 对象的示例:

"kernelspec": {
        "display_name": "python3",
        "language": "python",
        "name": "python3"
    }

因此,要解决您的错误,您需要更正笔记本元数据以添加正确的kernelspec对象。如果要编辑笔记本JSON文档并在元数据一级对象中添加kernelspec对象,则是最简单的方法。

"metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   }
}

您的错误可能是由于您使用清洁程序从笔记本输出中读取的,例如 nbstripout python软件包。如果是这种情况,请注意按照文档更改nbstripout设置。