快速将PDF转换为Jpegs并临时存储图像

时间:2019-06-03 22:45:26

标签: python pdf flask

我们的应用程序通过弹性搜索将10,000个PDFS动态地加载到DOM,在视觉上搜索特定文档时,我们需要全部10,000个PDFS。

User Interface
 __________
|          |   ID: 87819237
|          |   Filename: Application.pdf
|    PDF   |   Size: 105kbs
|          |   Path: /project/XYZ/case001
|          |
|__________|

加载10,000个PDF是一个噩梦,但是我们实现了延迟加载,但是pdf加载速度仍然很慢,可能需要2-3秒。因此,我开发了一种快速转换每个PDF的方法

<img src="{{ url_for('get_file', filepath=data['filepath']) }}">

路线

# Endpoint for fetching
@app.route('/get_file', methods=["POST"])
def get_file():

    # Get arg from HTML
    filepath = request.args.get('filepath')
    dirname, fname = os.path.split(filepath)

    # Check if source file exists
    if os.path.isfile(filepath):

        # Prepare new image file name
        base_filename = os.path.splitext(os.path.basename(fname))[0] + '.jpg'
        save_dir = './static/images'

        # If the image doesn't already exist, create it
        if not os.path.isfile(os.path.join(save_dir, base_filename)):
            with tempfile.TemporaryDirectory() as path:
                images_from_path = convert_from_path(filepath, output_folder=path, last_page=0, first_page=0, dpi=15)
            for page in images_from_path:
                page.save(os.path.join(save_dir, base_filename), 'JPEG')

        # The image file stored on the server            
        output = send_from_directory(save_dir, base_filename)

    else:
        output = send_from_directory('./static/images', 'placeholder.jpeg')

    return output

这很好用,并且可以比加载PDF更快地转换pdf,但是我想知道这是否是最好的方法吗?我可以在某个地方保存图像到会话关闭后将其擦除的地方吗?像临时文件夹一样?

0 个答案:

没有答案