使用随机生成器检索图像路径时,为什么python读取tiff图像的速度较慢

时间:2018-01-10 08:09:54

标签: python numpy random tiff

我正在尝试编写一个python程序,从大样本tiff图像中随机读取一些tiff图像。有趣的是,我发现如果我们使用随机生成器生成索引并获得图像路径列表,与使用硬代码随机索引获取图像路径相比,python倾向于读取tiff图像(浮点值)慢得多并阅读tiff图片。

import datetime
import matplotlib.pyplot as plt
import numpy

def read_in_seq(image_filenames, indices):
    return [ plt.imread(image_filenames[index]) for index in indices ]

image_filenames = []

for index in range(15000):
    image_filenames.append("/tmp/%05d" % index + ".tiff")

# This is generated from numpy.random.choice(15000, 100) but hard coded the values here
indices=[
  3885,   901,  6233,  7234, 10195,  2204,   469,  2906, 12114, 13515, 12977, 5201,
  8829, 11537,  5400,  9633, 10744, 12991,  2593,  3046,  5103,  1901,  8831, 12454,
  9779,  4714, 10839,  8702,  8537,  2136,  5095,  9006, 13293,  9933,  3584, 10818,
  8594, 11032,  3705,   435,  6679,  8349,  6930,  9741, 12933,  3231,  1849,  7871,
 11752,  8361,  3094,  2229, 14303,  2006,  5554,  1492, 14817, 12690, 10648, 14631,
  6401,  6181,  4401,  7222,  9881,  8381,  7603, 11374, 12702,  6881, 11868, 10967,
 14508, 12930,  3542,  1197,  8387, 11253,  1802, 14732,  7419, 11994,  6083,  8846,
  5370,  4276, 13953, 14409,  8197,  8956,  4717,  3262,  2314, 12527,  5394, 12495,
  6708,  9724,   740, 10416]

print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f') + ": Normal input read started with size=" + str(len(indices)))
output = read_in_seq(image_filenames, indices) # takes 0.8 seconds
print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f') + ": Normal input read completed with size=" + str(len(output)))

indices = numpy.random.choice(15000, 100)
print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f') + ": Random input read started with size=" + str(len(indices)))
output = read_in_seq(image_filenames, indices) # takes ~3 seconds
print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f') + ": Random input read completed with size=" + str(len(output)))

这是输出:

2018-01-10 15:30:46.170487: Normal input read started with size=100
2018-01-10 15:30:46.943557: Normal input read completed with size=100
2018-01-10 15:30:46.943718: Random input read started with size=100
2018-01-10 15:30:49.858074: Random input read completed with size=100

所有15000个tiff图像相同,每个~3MB。正如您所看到的,对于15000个tiff图像中的100个tiff图像,使用硬编码随机索引进行的正常输入读取仅需0.8秒。但是,当我们使用随机生成器生成的索引(例如numpy.random)时,需要将近3秒。

另一方面,如果我们修改上面的代码来读取15000张图像中的100 png图像。使用硬编码随机生成索引读取png图像的时间与numpy.random生成的索引(大约4秒)几乎相同。

for index in range(15000):
    image_filenames.append("/tmp/%05d" % index + ".png")
----
2018-01-10 16:20:30.498341: Normal input read started with size=100
2018-01-10 16:20:34.020450: Normal input read completed with size=100
2018-01-10 16:20:34.020602: Random input read started with size=100
2018-01-10 16:20:38.692906: Random input read completed with size=100

请注意,读取tiff图像的时间指标不计算numpy.random所花费的时间(仅计算读取图像的时间read_in_seq)。

让我们假设我们只能使用单线程,请有人解释为什么python在使用随机生成器检索图像路径时读取tiff图像较慢(与硬编码随机索引相比,检索图像路径)?例如它与CPU浮点支持,硬盘搜索,OS设计还是别的什么有关?

0 个答案:

没有答案