Question

我有订单或10 ^ 5个二进制文件，我在for循环中逐个阅读numpy的fromfile并使用pyplot的imshow绘图。每个文件大约需要一分钟的时间来阅读和绘图。

有没有办法加快速度？

这是一些伪代码来解释我的情况：

#!/usr/bin/env python

import numpy as np
import matplotlib as mpl
mpl.use('Agg')

import matplotlib.pyplot as plt

nx = 1200 ; ny = 1200

fig, ax = plt.subplots()
ax.set_xlabel('x')
ax.set_ylabel('y')

for f in files:
  data = np.fromfile(open(f,'rb'), dtype=float32, count=nx*ny)
  data.resize(nx,ny)
  im = ax.imshow(data)
  fig.savefig(f+'.png', dpi=300, bbox_inches='tight')
  im.remove()

我发现最后一步至关重要，以便内存不会爆炸。

Answer 1

由于图片数量非常大并且您使用imshow，我建议采用不同的方法。

创建一个具有所需尺寸和空白图像的输出文件（任何颜色都与脊椎颜色不同）
将数字保存到template.png
使用template.png

scipy.ndimage.imread

将图像数据加载到数组中
使用colormaps将数据转换为颜色
缩放图片以适合模板的像素尺寸（scipy.ndimage.zoom）
将像素数据复制到模板
按scipy.ndimage.save
根据需要多次重复步骤4 - 8

这会绕过很多渲染内容。一些评论：

步骤1可能需要相当多的摆弄（特别是抗锯齿可能需要注意，在刺的边缘有一个尖锐的黑/白边框是有益的）
如果第4步很慢（我不明白为什么），请尝试numpy.memmap
如果可以的话，尝试使用可以通过简单的算术运算形成数据的颜色图（例如，灰度，带灰度的灰度等），然后你可以更快地完成第5步
如果您可以使用未缩放数据的图像（即原始imshow使用的区域为1200x1200），则可以摆脱慢速缩放操作（步骤6）;如果您可以按整数下采样
如果您需要在步骤6中对图像进行重新取样，您还可以检查cv2（OpenCV）模块中的函数，它可能比scipy.ndimage

性能方面，最慢的操作是5,6和9.我希望该函数能够每秒处理10个数组。在此之上，磁盘I / O将开始成为限制因素。如果处理步骤是限制因素，我将只启动脚本的四个（假设有四个核心）副本，每个副本可以访问不同的2.5 x 10 ^ 4图像集。使用SSD磁盘时，这不应导致I / O寻求灾难。

但是，只有剖析才能说明。

Answer 2

很奇怪，重新启动后，我通常不会采用一种解决方案，每个文件的读取时间缩短到~0.002秒（平均），渲染时间约为0.02秒。保存.png文件大约需要2.6秒，所以总共大约需要2.7秒。

我接受了@DrV的建议，

...我将开始四个（假设有四个核心）脚本副本，每个副本可以访问不同的2.5 x 10 ^ 4图像集。使用SSD磁盘时，这不应导致I / O寻求灾难。

将文件列表分区为8个子列表，并运行了8个脚本实例。

@ DrV的评论

此外，如果文件不在RAM缓存中，则读取5.7 MB文件的0.002秒读取时间听起来不太合理，因为它表示磁盘读取速度为2.8 GB / s。（快速SSD可能只达到500 MB / s。）

让我对笔记本电脑的读/写速度进行了基准测试（MacBookPro10,1）。我使用以下代码生成1000个1200 * 1200随机浮点数（4个字节）的文件，这样每个文件是5.8 MB（1200 * 1200 * 4 = 5,760,000字节），然后逐个读取它们，为进程计时。代码是从终端运行的，从不占用超过50 MB或内存（在内存中只保留一个5.8 MB的数据阵列，不是吗？）。

代码：

#!/usr/bin/env ipython

import os
from time import time
import numpy as np

temp = 'temp'
if not os.path.exists(temp):
    os.makedirs(temp)
    print 'temp dir created'
os.chdir(temp)

nx = ny = 1200
nof = 1000
print '\n*** Writing random data to files ***\n'
t1 = time(); t2 = 0; t3 = 0
for i in range(nof):
    if not i%10:
        print str(i),
    tt = time()
    data = np.array(np.random.rand(nx*ny), dtype=np.float32)
    t2 += time()-tt
    fn = '%d.bin' %i
    tt = time()
    f = open(fn, 'wb')
    f.write(data)
    f.close
    t3 += time()-tt
print '\n*****************************'
print 'Total time: %f seconds' %(time()-t1)
print '%f seconds (on average) per random data production' %(t2/nof)
print '%f seconds (on average) per file write' %(t3/nof)

print '\n*** Reading random data from files ***\n'
t1 = time(); t3 = 0
for i,fn in enumerate(os.listdir('./')):
    if not i%10:
        print str(i),
    tt = time()
    f = open(fn, 'rb')
    data = np.fromfile(f)
    f.close
    t3 += time()-tt
print '\n*****************************'
print 'Total time: %f seconds' %(time()-t1)
print '%f seconds (on average) per file read' %(t3/(i+1))

# cleen up:
for f in os.listdir('./'):
    os.remove(f)
os.chdir('../')
os.rmdir(temp)

结果：

temp dir created

*** Writing random data to files ***

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 540 550 560 570 580 590 600 610 620 630 640 650 660 670 680 690 700 710 720 730 740 750 760 770 780 790 800 810 820 830 840 850 860 870 880 890 900 910 920 930 940 950 960 970 980 990 
*****************************
Total time: 25.569716 seconds
0.017786 seconds (on average) per random data production
0.007727 seconds (on average) per file write

*** Reading random data from files ***

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 540 550 560 570 580 590 600 610 620 630 640 650 660 670 680 690 700 710 720 730 740 750 760 770 780 790 800 810 820 830 840 850 860 870 880 890 900 910 920 930 940 950 960 970 980 990 
*****************************
Total time: 2.596179 seconds
0.002568 seconds (on average) per file read

使用python绘制数千个文件

2 个答案: