在Python中对多页TIFF页面求平均值

时间:2014-05-12 22:08:09

标签: python image-processing numpy tiff

将多帧16位TIFF图像的平均值作为numpy数组的最快/内存有效方法是什么?

到目前为止我提出的是下面的代码。令我惊讶的是,method2比method1更快。

但是,对于从未假设的分析,测试它!所以,我想测试更多。 值得尝试Wand?我没有在这里包括,因为在安装了ImageMagick-6.8.9-Q16和MAGICK_HOME后,它仍然没有导入...在Python中用于多页tiff的任何其他库? GDAL对此可能有点太多了。

(编辑)我包括了libtiff。仍然方法2最快,内存效率更高。

from time import time

#import cv2  ## no multi page tiff support
import numpy as np
from PIL import Image
#from scipy.misc import imread  ## no multi page tiff support
import tifffile # http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html
from libtiff import TIFF # https://code.google.com/p/pylibtiff/

fp = r"path/2/1000frames-timelapse-image.tif"

def method1(fp):
    '''
    using tifffile.py by Christoph (Version: 2014.02.05)
    (http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html)
    '''
    with tifffile.TIFFfile(fp) as imfile:
        return imfile.asarray().mean(axis=0)


def method2(fp):
    'primitive peak memory friendly way with tifffile.py'
    with tifffile.TIFFfile(fp) as imfile:

        nframe, h, w = imfile.series[0]['shape']
        temp = np.zeros( (h,w), dtype=np.float64 )

        for n in range(nframe):
            curframe = imfile.asarray(n)
            temp += curframe

        return (temp / nframe)


def method3(fp):
    ' like method2 but using pillow 2.3.0 '
    im = Image.open(fp)

    w, h = im.size
    temp = np.zeros( (h,w), dtype=np.float64 )

    n = 0
    while True:
        curframe = np.array(im.getdata()).reshape(h,w)
        temp += curframe
        n += 1
        try:
            im.seek(n)
        except:
            break

    return (temp / n)


def method4(fp):
    '''
    https://code.google.com/p/pylibtiff/
    documentaion seems out dated.
    '''

    tif = TIFF.open(fp)
    header = tif.info()

    meta = dict()  # extracting meta
    for l in header.splitlines():
        if l:
            if l.find(':')>0:
                parts = l.split(':')
                key = parts[0]
                value = ':'.join(parts[1:])
            elif l.find('=')>0:
                key, value =l.split('=')
            meta[key] = value    

    nframes = int(meta['frames'])
    h = int(meta['ImageLength'])
    w = int(meta['ImageWidth'])

    temp = np.zeros( (h,w), dtype=np.float64 )

    for frame in tif.iter_images():
        temp += frame

    return (temp / nframes)

t0 = time()
avgimg1 = method1(fp)
print time() - t0
# 1.17-1.33 s

t0 = time()
avgimg2 = method2(fp)
print time() - t0
# 0.90-1.53 s  usually faster than method1 by 20%

t0 = time()
avgimg3 = method3(fp)
print time() - t0
# 21 s

t0 = time()
avgimg4 = method4(fp)
print time() - t0
# 1.96 - 2.21 s  # may not be accurate. I got warning for every frame with the tiff file I tested.

np.testing.assert_allclose(avgimg1, avgimg2)
np.testing.assert_allclose(avgimg1, avgimg3)
np.testing.assert_allclose(avgimg1, avgimg4)

1 个答案:

答案 0 :(得分:-1)

简单的逻辑会让我在方法1或3上赌我的钱,因为方法2和4在它们中有for循环。 For循环如果你有更多的输入,总是让你的代码变慢。

我肯定会选择方法1:整洁,清晰阅读...

要确定,我会说测试它们。如果您不想测试,我会选择方法一。

亲切的问候,