IPython Notebook中的奇怪符号

时间:2016-03-24 07:19:10

标签: azure encoding jupyter-notebook azure-machine-learning-studio

我在我的IPython笔记本中使用了西里尔符号。当我在ML工作室工作时,它工作正常。

但是当我下载笔记本并打开它们时(例如在http://try.jupyter.org上),我看到了奇怪的字符。

错误的笔记本(在Azure ML Studio上创建):

{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}

$ file bad.ipynb 
bad.ipynb: ASCII text, with very long lines, with no line terminators

“好”版本,在http://try.jupyter.org上创建:

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "тест"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}

$ file good.ipynb 
good.ipynb: UTF-8 Unicode text

1 个答案:

答案 0 :(得分:1)

我做了一些实验,发现你的json编码成utf-8。对于您的情况,获取真实内容很简单。请参阅以下代码:

Python 3.x

a = '{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}'

result = a.encode('latin-1').decode("utf-8")

Python 2.x

a = '{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}'

result = a.decode('unicode-escape').encode("latin-1")

这段代码可能不适用于其他一些情况,因为'latin-1'并不涵盖所有0-255个字符。因此,我仍然在为这类事情寻找更好的编码。