这是我第一次遇到使用scipy的代码,特别是这个代码:https://github.com/milossramek/office-interoperability-tools/blob/master/scripts/docompare.py并且大多数时候我运行它,它使用我所有的pc内存使得pc几分钟不可用(如果我之前没有杀过它),所以我只是使用memory_profiler来描述它,这就是我所看到的:
310 # create arrays with aligned lines
311 1307.492 MiB 0.000 MiB vh_lines=[] # horizontally aligned lines
312 1307.492 MiB 0.000 MiB v_lines=[] # original lines from iarray1
313 1307.492 MiB 0.000 MiB indices=[]
314 4230.641 MiB 2923.148 MiB for i in range(min(len(tx0),len(tx1))):
315 1384.742 MiB -2845.898 MiB l0= GetLine(itrim0, tx0, i)
316 1461.992 MiB 77.250 MiB l1 = GetLine(itrim1, tx1, i)
317 3286.629 MiB 1824.637 MiB cline, ind = alignLineIndex(l0, l1)
318 3286.707 MiB 0.078 MiB vh_lines.append(cline)
319 3286.707 MiB 0.000 MiB indices.append(ind)
320 #cline, ind = alignLineIndex(GetLine(itrim0, tx0, i), GetLine(itrim1, tx1, i), halign=False)
321 4230.410 MiB 943.703 MiB cline, ind = alignLineIndex(l0, l1, halign=False)
322 4230.633 MiB 0.223 MiB v_lines.append(cline)
323 4230.691 MiB 0.059 MiB indices = np.array(indices)
和def def alignLineIndex(l1,l2,halign = True):
267 3365.109 MiB 0.000 MiB horizPosErr = 0
268 3365.109 MiB 0.000 MiB if halign:
269 1510.734 MiB -1854.375 MiB horizPosErr,l1,l2=align(l1,l2,1)
270
271 # align in the vertical direction
272 3365.109 MiB 1854.375 MiB vertPosErr,ll1,ll2=align(l1,l2,0)
273
274 #overlap index
275 3442.359 MiB 77.250 MiB diff = ll1 != ll2
276 3442.359 MiB 0.000 MiB if np.sum(ll2) + np.sum(ll1) == 0:
277 ovlapindex = 1.0
278 else:
279 3442.359 MiB 0.000 MiB ovlapindex = 1.0 - float(np.sum(diff)) / (np.sum(ll2) + np.sum(ll1))
280
281
282 4060.344 MiB 617.984 MiB ld1 = distancetransf(ll1)
283 4677.141 MiB 616.797 MiB ld2 = distancetransf(ll2)
和内部def distancetransf(图像):
47 4091.801 MiB 0.000 MiB @profile
48 def distancetransf(image):
49 4091.801 MiB 0.000 MiB if image.dtype=='bool':
50 return ndimage.distance_transform_edt(1-image.astype(np.int8))
51 else:
52 4679.246 MiB 587.445 MiB return ndimage.distance_transform_edt(1-image)
所以,我的问题是:distance_transform_edt是否应该对缓慢负责?如果你,我怎么能改变它以提高性能并避免脚本使用所有内存?如上所述,我对scipy完全不熟悉,我不知道如何改进它。
谢谢