我想知道读取/写入图像和PDF文件的最有效方法是什么作为numpy数组进行处理。
到目前为止,我已经看过scipy.ndimage.imread
并使用PIL
和numpy,结果如下:
import os
import glob
from scipy.ndimage import imread
from PIL import Image
import numpy as np
import timeit
iters = 2
def scipy_fun():
for x in glob.glob("*.jpg"):
px = imread(x)
def PIL_fun():
for x in glob.glob("*.jpg"):
with Image.open(x) as im:
px = np.array(im)
print(timeit.Timer(scipy_fun).timeit(number=iters))
print(timeit.Timer(PIL_fun).timeit(number=iters))
运行脚本会显示类似的结果,而scipy则稍微好一些:
2.8794324089019234
3.0174482765699095
有没有更快的方法呢?
答案 0 :(得分:0)
首先,执行此操作
pip install pdf2image
然后
import numpy as np
from pdf2image import convert_from_path as read
import PIL
import cv2
#pdf in the form of numpy array to play around with in OpenCV or PIL
img = np.asarray(read('path to the pdf file')[0])#first page of pdf