减少数据的写入时间

时间:2015-05-20 18:06:03

标签: python optimization time writing

我正在处理CFD数据(对坐标应用旋转)。为此,我要做以下事情:

- 阅读文件

- 将数据存储到结构化数组

- 操纵数据(进行计算)

- 写一个新文件

它可以工作,但每个文件需要7秒。我有(15000 * 4)个文件要继续......

The output folder already exists. The data in it will be erased
StartReading B--0.000018_tec.dat in progress. - 0.001s elapsed
EndReading B--0.000018_tec.dat in progress. - 0.433s elapsed
StartWriting B--0.000018_tec.dat in progress. - 0.435s elapsed
EndWriting B--0.000018_tec.dat in progress. - 7.585s elapsed

StartReading B--0.000036_tec.dat in progress. - 7.586s elapsed
EndReading B--0.000036_tec.dat in progress. - 7.697s elapsed
StartWriting B--0.000036_tec.dat in progress. - 7.697s elapsed
EndWriting B--0.000036_tec.dat in progress. - 13.472s elapsed

你有什么想法可以改善这种想法吗?我考虑过写作,但我不确定它会改进什么。

以下是阅读/计算/写作时间的示例:

    NSMutableString *dict = [NSMutableString string];
    [dict appendString:@"{"];
    [dict appendFormat:@"'notes':'%@'", notes];
    [dict appendFormat:@",'date':'%f'",seconds];
    [dict appendFormat:@",'count':'%d'",[ss.count intValue]];
    [dict appendFormat:@",'weather':'%@'",wx];
    [dict appendFormat:@",'location':'%@'",ss.event.location.name];
    [dict appendFormat:@",'latitude':'%@'",[ss.event.location.latitude stringValue]];
    [dict appendFormat:@",'longitude':'%@'",[ss.event.location.longitude stringValue]];
    [dict appendString:@"}"];

脚本和样本试图让它更加鲁莽:

http://s000.tinyupload.com/index.php?file_id=80589646527340633700

1 个答案:

答案 0 :(得分:1)

问题不在于写作本身,而是如何为写作准备和格式化数据。

如果您使用python -m cProfile -s cumtime Plane_modifier_rev4-multiple_files.py > out.txt之类的内容对脚本进行分析,您会发现大部分时间花在数组格式上

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.003    0.003   22.297   22.297 Plane_modifier_rev4-multiple_files.py:6(<module>)
        2    0.282    0.141   21.881   10.941 ASCII_TEC.py:101(write_tecplot)
77424/48512    0.091    0.000   21.527    0.000 numeric.py:1681(array_str)
77424/48512    0.424    0.000   21.477    0.000 arrayprint.py:343(array2string)
    48512    0.928    0.000   21.149    0.000 arrayprint.py:233(_array2string)
   145536    0.360    0.000   12.532    0.000 arrayprint.py:533(__init__)
   145536    5.891    0.000   12.172    0.000 arrayprint.py:547(fillFormat)
    48512    0.219    0.000    7.922    0.000 arrayprint.py:700(__init__)
    48512    0.620    0.000    5.623    0.000 arrayprint.py:465(_formatArray)
   170236    2.416    0.000    4.413    0.000 arrayprint.py:598(__call__)
   631546    1.300    0.000    2.933    0.000 numeric.py:2428(seterr)
   434430    2.310    0.000    2.310    0.000 {method 'reduce' of 'numpy.ufunc' objects}
   315773    0.337    0.000    1.941    0.000 numeric.py:2813(__enter__)
   143356    0.234    0.000    1.814    0.000 fromnumeric.py:1772(any)
   315773    0.359    0.000    1.689    0.000 numeric.py:2818(__exit__)
    48512    0.473    0.000    1.268    0.000 arrayprint.py:639(__init__)
   143356    0.157    0.000    1.163    0.000 {method 'any' of 'numpy.ndarray' objects}
   631546    0.967    0.000    1.034    0.000 numeric.py:2524(geterr)
   143356    0.092    0.000    1.006    0.000 _methods.py:37(_any)
   443944    0.763    0.000    0.944    0.000 arrayprint.py:632(_digits)
   143358    0.166    0.000    0.418    0.000 numeric.py:464(asanyarray)
   145536    0.410    0.000    0.410    0.000 {method 'compress' of 'numpy.ndarray' objects}

e.g。

这个

  for name in names:
        for col_index in range(0,N,5):  #The tecplot data for each variable are saved within 5 columns
            f.write(str(Data["node"][name][col_index:col_index+5])[1:-1]+"\n")
        f.write("\n"+"\n")

可以改写(并且它必须更快),如

    for name in names:
        n = Data["node"][name]
        for col_index in range(0,N,5):  #The tecplot data for each variable are saved within 5 columns
            vs = n[col_index:col_index+5]
            f.write(",".join([str(v) for v in vs])+"\n")
        f.write("\n"+"\n")

修改

write_tecplot上的一些变化

def write_tecplot(outfile,Data):
    """
    The expected Data is a dictionary with one structured array: node and one simple array: face
    """
    N = Data["node"].shape[0]   #N is the number of nodes
    E = Data["face"].shape[0]  #E is the number of faces

    #Create the file and the main names
    with open(outfile+'.dat', 'w') as f:
        """ Write HEADER """
        f.write('TITLE = \"title\"\n')
        f.write('VARIABLES  = ')
        #initialize
        names = Data["node"].dtype.names

        #write variable names
        f.write(u'"'+'\",\"'.join(names)+'"\n')
        f.write('ZONE T="tecdata", N=%s, E=%s, ET=QUADRILATERAL, F=FEBLOCK\n\n'%(N,E))

#        Data_number =  len(Data["node"])     #Data_number is the 

        """ WRITE DATA """
        #Write node data
        for name in names:
            n = Data["node"][name]
            for col_index in range(0,N,5):  #The tecplot data for each variable are saved within 5 columns
                f.write(",".join([str(v) for v in n[col_index:col_index+5]])+"\n")
            f.write("\n\n")


        face = Data["face"]
        for col_index in range(0,E,1):  #The tecplot data for each variable are saved within 5 columns
            f.write(",".join([str(v) for v in face[col_index]])+"\n")
        f.write("\n\n")