在基于django类的视图中保存新模型后,自动将docx / pdf文件转换为文本文件

时间:2019-04-04 16:21:32

标签: python django

我正在设置一个django应用程序以在线申请,我的模型包含一些charfields和一个履历的文件字段,我想要的是:每次保存新的履历时,它将自动转换为txt格式,保存在媒体文件夹中。问题是该转换仅在我重新启动服务器后才能起作用。

这是我的观点:

from django.shortcuts import render
from rest_framework import viewsets, permissions     
from rest_framework.parsers import FormParser, MultiPartParser              
from .serializers import candidateSerializer
from .models import Candidate
from .conversion import convertPDF, convertDOCX, handle_uploaded_file
#from django.db.models.signals import post_save
#from django.dispatch import receiver
from rest_framework.response import Response
from rest_framework.decorators import action
#from django.core.files import File



class candidateView(viewsets.ModelViewSet):
    permission_classes = [
        permissions.AllowAny,
    ]
    serializer_class = candidateSerializer
    queryset = Candidate.objects.all()
    cv = list(queryset.values('CV'))
    cvName = [el['CV'] for el in cv]
    file = cvName[len(cvName)-1]
    handle_uploaded_file(file)

这是用于转换上传文件的handle_uploaded_file函数:

def handle_uploaded_file(file):
    Dir = 'C:/workspace/backend/media/'
    textDir = 'C:/workspace/backend/media/textResumes/'

    if file.endswith(".pdf"):
        name = file.split(".")[0]
        textfilename = name + '.txt'
        filename = Dir + file
        doc= convertPDF(filename)
        f = open(textDir + textfilename, 'w+', encoding="utf-8")
        for line in doc:
            f.write(line)
        f.close()


    if file.endswith(".DOCX"):
        name = file.split(".")[0]
        textfilename = name + '.txt'
        filename = Dir +file
        doc = docx2txt.process(filename)
        f = open(textDir + textfilename, 'w+', encoding="utf-8")
        for line in doc:
            f.write(line)
        f.close()

    if file.endswith(".docx"):
        name = file.split(".")[0]
        textfilename = name + '.txt'
        filename = Dir +file
        doc = convertDOCX(filename)
        f = open(textDir + textfilename, 'w+', encoding="utf-8")
        for line in doc:
            f.write(line)
        f.close()

def convertPDF(fname):
    with open(fname, 'rb') as f:
        pdfReader = PyPDF2.PdfFileReader(fname)
        content = []
        for i in range(pdfReader.numPages):
            pageObj = pdfReader.getPage(i)
            content.append(pageObj.extractText())
        doc = ''
        for line in content:
            doc = doc + line
    return doc

def convertDOCX(fname):
    doc = docx.Document(fname)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    doc = ''
    for line in fullText:
        doc = doc+ line
    return doc

1 个答案:

答案 0 :(得分:0)

有人问类似的question。我不确定为什么也要在视图文件中创建模型类。