Question

是否有一种从数据库转储UTF-8数据的简单方法？

我知道这个命令：

manage.py dumpdata > mydata.json

但是我在文件mydata.json中得到的数据，Unicode数据看起来像是：

"name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8"

我希望看到一个真正的Unicode字符串，如全球卫星定位系统（中文）。

Answer 1

在遇到类似问题之后，我刚刚发现，xml格式化程序正确处理UTF8。

manage.py dumpdata --format=xml > output.xml

我不得不将数据从Django 0.96传输到Django 1.3。经过多次尝试转储/加载数据后，我终于成功使用了xml。现在没有副作用。

希望这对某些人有所帮助，因为我在寻找解决方案时已经登陆了这个线程。

Answer 2

django-admin.py dumpdata yourapp可以为此目的转储。

或者，如果您使用MySQL，则可以使用mysqldump命令转储整个数据库。

this thread有许多方法可以转储数据，包括手动方法。

更新：因为OP编辑了这个问题。

要将JSON编码字符串转换为人类可读的字符串，您可以使用：

open("mydata-new.json","wb").write(open("mydata.json").read().decode("unicode_escape").encode("utf8"))

Answer 3

您需要在Django代码中找到对json.dump*()的调用并传递附加选项ensure_ascii=False，然后对结果进行编码，或者您需要使用json.load*()来加载JSON然后使用该选项转储它。

Answer 4

我在这里写了snippet for that。适合我！

Answer 5

由于YOU提供了一个很好的answer被接受，因此应该考虑使用python 3 distincts text and binary data，因此必须以二进制模式打开两个文件：

open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))

否则，将引发错误AttributeError: 'str' object has no attribute 'decode'。

Answer 6

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)

Answer 7

您可以创建自己的序列化程序，该序列化程序将ensure_ascii=False参数传递给json.dumps函数：

# serfializers/json_no_uescape.py
from django.core.serializers.json import *


class Serializer(Serializer):

    def _init_options(self):
        super(Serializer, self)._init_options()
        self.json_kwargs['ensure_ascii'] = False

然后注册新的序列化程序（例如在您的应用__init__.py文件中）：

from django.core.serializers import register_serializer

register_serializer('json-no-uescape', 'serializers.json_no_uescape')

然后您可以运行：

manage.py dumpdata --format=json-no-uescape > output.json

Answer 8

就把它留在这里

./manage.py dumpdata --indent=2 core.item | python3 -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" > core/fixtures/item.json

Answer 9

我遇到了同样的问题。阅读所有答案后，我想到了Ali和darthwade的答案的组合：

manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)

在Python 3中，我不得不以 binary模式打开文件并解码为 unicode-escape 。另外，当我在写入（二进制）模式下打开时，我还添加了 utf-8 。

我希望它会有所帮助：）

Answer 10

此problem已在Django 3.1中为JSON和YAML修复。

Answer 11

这是来自 djangoproject.com 的解决方案
您转到“设置”，“语言”-“管理语言设置”-“更改系统区域设置”-“区域设置”中的“使用 Unicode UTF-8 进行全球语言支持”框。如果我们应用它并重新启动，那么我们会从 Python 中获得一个合理的、现代的、默认的编码。 djangoproject.com

Django dumpdata UTF-8（Unicode）

11 个答案: