Question

我正在尝试将大量小的二进制格式文件（~200,000）作为numpy数组读入python中的字典：

import os
import numpy as np

def readfiles(limit):
    filelist = {}
    i=1
    for filename in os.listdir('folder'):
        filelist[filename] = np.fromfile('folder/'+filename, 'float32')
        i += 1
        if i > limit:
           break

    return filelist

limit参数仅用于测试较少数量的文件，通常我会读取文件夹中的所有文件。

我第一次以相当大的限制（90,000）运行脚本时，需要大约68秒。如果我立即重新运行脚本，它运行在~1.2秒。 cProfiles给出：

>>> cProfile.run('readfiles(90000)')

90005 function calls in 68.768 seconds
Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.284    0.284   68.690   68.690 <ipython-input-57-939c6a92cd68>:1(readfiles)
    1    0.079    0.079   68.768   68.768 <string>:1(<module>)
    1    0.000    0.000   68.768   68.768 {built-in method builtins.exec}
90000   68.313    0.001   68.313    0.001 {built-in method numpy.core.multiarray.fromfile}
    1    0.093    0.093    0.093    0.093 {built-in method posix.listdir}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


>>> cProfile.run('readfiles(90000)')

90005 function calls in 1.970 seconds
Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.137    0.137    1.900    1.900 <ipython-input-57-939c6a92cd68>:1(readfiles)
    1    0.070    0.070    1.970    1.970 <string>:1(<module>)
    1    0.000    0.000    1.970    1.970 {built-in method builtins.exec}
90000    1.673    0.000    1.673    0.000 {built-in method numpy.core.multiarray.fromfile}
    1    0.090    0.090    0.090    0.090 {built-in method posix.listdir}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

随后，当我在完全不同的会话中重新运行脚本时，我仍然得到~1.2s。这对我来说似乎很奇怪。似乎np.fromfile在完成一次文件后并没有真正重新读取文件，而是第二次读取一些缓存文件。但我没有听说过缓存数据在这种情况下在另一个会话中被重用。是对的吗？如果是，我如何更改此代码以便代码实际重新读取文件？如果没有，为什么第一次运行需要这么长时间？

我正在使用Python 3.5.1和NumPy 1.11.2

编辑：通过重新启动系统，我得到更长的运行时间，因此这必须是操作系统级别的缓存，如注释中所指出的那样。没有重新启动系统的任何方法？

重新运行一个将文件作为numpy数组读入的python脚本实际上不会重新读取？

0 个答案: