Question

研究员，

我无法解析使用django表单提交的unicode文本文件。以下是我执行的快速步骤：

上传了一个文本文件（编码：utf-16）（文件内容：Hello World 13）
在服务器端，使用filename = request.FILES['file_field']
逐行：for line in filename: yield line
type(filename)给了我<class 'django.core.files.uploadedfile.InMemoryUploadedFile'>
type(line)是<type 'str'>
print line：'\xff\xfeH\x00e\x00l\x00l\x00o\x00 \x00W\x00o\x00r\x00l\x00d\x00 \x001\x003\x00'
codecs.BOM_UTF16_LE == line[:2]返回True
现在，我想重新构造unicode或ascii字符串，如“Hello World 13”，以便我可以从行解析整数。

执行此操作的最丑陋方式之一是使用line[-5:]（= '\x001\x003\x00'）进行检索，从而使用line[-5:][1]，line[-5:][3]进行构建。

我相信必须有更好的方法来做到这一点。请帮忙。

提前致谢！

Answer 1

使用codecs.iterdecode()动态解码对象：

from codecs import iterdecode

for line in iterdecode(filename, 'utf16'): yield line