Question

我有一个接受文件（.doc，docx或pdf）的视图：

from rest_framework.decorators import api_view
from rest_framework.parsers import FileUploadParser

@api_view(['POST'])
@parser_classes((FileUploadParser,) )
def parse_document(request, format=None):
    file_obj = request.data['file']

我需要解析这些文件并返回json。我正在使用Textract将文档转换为文本，但为了实现这一点，我需要将文件路径传递给Textract，因此我需要将文件临时写入文件系统。

我已经尝试过读取文件的内容并将它们写入临时文件，但我最终得到了像x00 \ x14这样的混乱文本。我也尝试使用'utf-8'解码文件，但收到错误

'utf8' codec can't decode byte 0xe9 in position 10: invalid continuation byte

我可以正常读取.txt文件，只有.txt以外的文件类型才会出现此错误。我觉得必须有一种方法可以将文件复制到临时存储而无需读取内容。

@api_view(['POST'])
@parser_classes((FileUploadParser,) )
def parse_resume(request, format=None):
    file_obj = request.data['file']

    tempf, tempfn = tempfile.mkstemp()
    try:
        for chunk in file_obj.chunks():
            os.write(tempf, chunk)
    except:
        raise Exception("Problem with the input file %s" % file_obj.name)
    finally:
        text = textract.process(tempfn).decode('utf-8') # This is where the error described above is thrown
        os.close(tempf)

    return Response({"text": None})

Answer 1

通过rest-framework处理上传与Django本身的处理方式不同（除了你可以使用request.data而不是request.FILES）。尝试将问题分解为较小的部分，并查看事情开始中断的地方。如下：

创建简单的django视图并将文件存储到硬编码文件路径。确保一切正常。以下是文档：https://docs.djangoproject.com/en/2.0/topics/http/file-uploads/
用rest-framework视图替换django视图
将硬编码文件路径替换为临时文件

如何将文件写入通过django rest框架收到的临时存储中？

1 个答案: