Question

我在我的IPython笔记本中使用了西里尔符号。当我在ML工作室工作时，它工作正常。

但是当我下载笔记本并打开它们时（例如在http://try.jupyter.org上），我看到了奇怪的字符。

错误的笔记本（在Azure ML Studio上创建）：

{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}

$ file bad.ipynb 
bad.ipynb: ASCII text, with very long lines, with no line terminators

“好”版本，在http://try.jupyter.org上创建：

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "тест"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}

$ file good.ipynb 
good.ipynb: UTF-8 Unicode text

Answer 1

我做了一些实验，发现你的json编码成utf-8。对于您的情况，获取真实内容很简单。请参阅以下代码：

Python 3.x

a = '{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}'

result = a.encode('latin-1').decode("utf-8")

Python 2.x

a = '{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}'

result = a.decode('unicode-escape').encode("latin-1")

这段代码可能不适用于其他一些情况，因为'latin-1'并不涵盖所有0-255个字符。因此，我仍然在为这类事情寻找更好的编码。

IPython Notebook中的奇怪符号

1 个答案: