Question

我用GUI编写了疲劳分析程序。程序为有限元模型的每个元素获取单位载荷的应变信息，使用np.genfromtxt读取载荷工况（＆＃39; loadcasefilename.txt＆＃39;）然后进行一些疲劳分析并保存每个元素的结果另一个数组中的元素。

负载情况大约是32Mb作为文本文件，有40个左右的循环中读取和分析。通过获取载荷工况阵列的切片来内插每个元件的载荷。

GUI和疲劳分析在不同的线程中运行。点击＆＃39;开始＆＃39;在疲劳分析中，它开始在疲劳分析中的载荷情况下循环。

这让我想到了我的问题。如果我有很多元素，分析将无法完成。它的退出时间取决于有多少元素，这让我觉得它可能是一个记忆问题。我已经尝试通过删除每个循环结束时的负载大小写数组（在删除所有数组之后删除它）并运行gc.collect（）来解决此问题，但这并没有取得任何成功。

在MatLab中，我会使用＆＃39; pack＆＃39;函数将工作区写入磁盘，清除它，然后在每个循环结束时重新加载它。我知道这不是一个好习惯，但它可以完成工作！我可以用某种方式在Python中做同等的事情吗？

以下代码：

for LoadCaseNo in range(len(LoadCases[0]['LoadCaseLoops'])):#range(1):#xxx
    #Get load case data
    self.statustext.emit('Opening current load case file...')
    LoadCaseFilePath=LoadCases[0]['LoadCasePaths'][LoadCaseNo][0]
    #TK: load case paths may be different
    try:
      with open(LoadCaseFilePath):
        pass
    except Exception as e:
        self.statustext.emit(str(e))


    LoadCaseLoops=LoadCases[0]['LoadCaseLoops'][LoadCaseNo,0]
    LoadCase=np.genfromtxt(LoadCaseFilePath,delimiter=',')

    LoadCaseArray=np.array(LoadCases[0]['LoadCaseLoops'])
    LoadCaseArray=LoadCaseArray/np.sum(LoadCaseArray,axis=0)
    #Loop through sections
    for SectionNo in  range(len(Sections)):#range(100):#xxx 
        SectionCount=len(Sections)
        #Get section data
        Elements=Sections[SectionNo]['elements']
        UnitStrains=Sections[SectionNo]['strains'][:,1:]
        Nodes=Sections[SectionNo]['nodes']
        rootdist=Sections[SectionNo]['rootdist']
        #Interpolate load case data at this section
        NeighbourFind=rootdist-np.reshape(LoadCase[0,1:],(1,-1))
        NeighbourFind[NeighbourFind<0]=1e100
        nearest=np.unravel_index(NeighbourFind.argmin(), NeighbourFind.shape)
        nearestcol=int(nearest[1])
        Distance0=LoadCase[0,nearestcol+1]
        Distance1=LoadCase[0,nearestcol+7]
        MxLow=LoadCase[1:,nearestcol+1]
        MxHigh=LoadCase[1:,nearestcol+7]
        MyLow=LoadCase[1:,nearestcol+2]
        MyHigh=LoadCase[1:,nearestcol+8]
        MzLow=LoadCase[1:,nearestcol+3]
        MzHigh=LoadCase[1:,nearestcol+9]
        FxLow=LoadCase[1:,nearestcol+4]
        FxHigh=LoadCase[1:,nearestcol+10]
        FyLow=LoadCase[1:,nearestcol+5]
        FyHigh=LoadCase[1:,nearestcol+11]
        FzLow=LoadCase[1:,nearestcol+6]
        FzHigh=LoadCase[1:,nearestcol+12]
        InterpFactor=(rootdist-Distance0)/(Distance1-Distance0)
        Mx=MxLow+(MxHigh-MxLow)*InterpFactor[0,0]
        My=MyLow+(MyHigh-MyLow)*InterpFactor[0,0]
        Mz=MzLow+(MzHigh-MzLow)*InterpFactor[0,0]
        Fx=-FxLow+(FxHigh-FxLow)*InterpFactor[0,0]
        Fy=-FyLow+(FyHigh-FyLow)*InterpFactor[0,0]
        Fz=FzLow+(FzHigh-FzLow)*InterpFactor[0,0]
        #Loop through section coordinates
        for ElementNo in range(len(Elements)):
            MaterialID=int(Elements[ElementNo,1])
            if Materials[MaterialID]['curvefit'][0,0]!=3:
                StrainHist=UnitStrains[ElementNo,0]*Mx+UnitStrains[ElementNo,1]*My+UnitStrains[ElementNo,2]*Fz

            elif Materials[MaterialID]['curvefit'][0,0]==3:

                StrainHist=UnitStrains[ElementNo,3]*Fx+UnitStrains[ElementNo,4]*Fy+UnitStrains[ElementNo,5]*Mz

            EndIn=len(StrainHist)
            Extrema=np.bitwise_or(np.bitwise_and(StrainHist[1:EndIn-1]<=StrainHist[0:EndIn-2] , StrainHist[1:EndIn-1]<=StrainHist[2:EndIn]),np.bitwise_and(StrainHist[1:EndIn-1]>=StrainHist[0:EndIn-2] , StrainHist[1:EndIn-1]>=StrainHist[2:EndIn]))
            Extrema=np.concatenate((np.array([True]),Extrema,np.array([True])),axis=0)
            Extrema=StrainHist[np.where(Extrema==True)]
            del StrainHist
            #Do fatigue analysis
        self.statustext.emit('Analysing load case '+str(LoadCaseNo+1)+' of '+str(len(LoadCases[0]['LoadCaseLoops']))+' - '+str(((SectionNo+1)*100)/SectionCount)+'% complete')
        del MxLow,MxHigh,MyLow,MyHigh,MzLow,MzHigh,FxLow,FxHigh,FyLow,FyHigh,FzLow,FzHigh,Mx,My,Mz,Fx,Fy,Fz,Distance0,Distance1
    gc.collect()

Answer 1

在某处显然存在保留周期或其他泄漏，但如果没有看到您的代码，就不可能说出更多。但是，因为你似乎对解决方案比解决方案更感兴趣......

在MatLab中，我会使用＆＃39; pack＆＃39;函数将工作区写入磁盘，清除它，然后在每个循环结束时重新加载它。我知道这不是一个好习惯，但它可以完成工作！我可以用某种方式在Python中做同等的事情吗？

不，Python与pack没有任何等价物。（当然，如果您确切知道要保留哪些值，则可以始终np.savetxt或pickle.dump或以其他方式存储它们，然后exec或spawn新的解释器实例，然后np.loadtxt或pickle.load或以其他方式恢复这些值。但是如果您确切地知道要保留哪些值，那么您可能无法解决这个问题。首先，除非你真的在NumPy中遇到了未知的内存泄漏，这是不可能的。）

但它有一些可能更好的。启动子进程以分析每个元素（或每批元素，如果它们足够小以至于进程生成开销很重要），将结果发送回文件或队列，然后退出。

例如，如果你这样做：

def analyze(thingy):
    a = build_giant_array(thingy)
    result = process_giant_array(a)
    return result

total = 0
for thingy in thingies:
    total += analyze(thingy)

您可以将其更改为：

def wrap_analyze(thingy, q):
    q.put(analyze(thingy))

total = 0
for thingy in thingies:
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=wrap_analyze, args=(thingy, q))
    p.start()
    p.join()
    total += q.get()

（这假设每个thingy和result都是小而且可选择的。如果它是一个巨大的NumPy数组，请查看NumPy的共享内存包装器，它们被设计为当你需要share memory directly between processes而不是传递它时，让事情变得更容易。）

但您可能希望了解multiprocessing.Pool可以为您自动执行此操作（以及更轻松地将代码扩展到例如并行使用所有内核）。请注意，它有一个maxtasksperchild参数，您可以使用该参数来回收每个（例如10个）内容的池进程，因此它们不会耗尽内存。

但回到实际上试图简单地解决问题：

我尝试通过删除每个循环结束时的负载大小写数组（在删除所有数组之后删除它）并运行gc.collect（）来解决此问题，但这并没有取得任何成功。

这些都不应该有任何区别。如果您只是在每次循环中将所有局部变量重新分配给新值，并且不在其他任何地方保留对它们的引用，那么它们无论如何都会被释放，所以你和＆＃＃39;在（短暂的）时间内永远不会超过2个。 gc.collect()仅在有参考周期时才有用。所以，一方面，这些没有效果的好消息 - 这意味着你的代码中没有任何明显的愚蠢。另一方面，这是个坏消息 - 这意味着无论出现什么问题都不是明显的愚蠢。

通常人们会看到这一点，因为他们在没有意识到的情例如，您可能vstack将所有新行放到旧版giant_array而不是空数组上，然后删除旧版本......但这并不重要，因为每次都通过循环，giant_array不是5 * N，它是5 * N，然后是10 * N，然后是15 * N，依此类推。（这只是愚蠢的我不久前的一个例子......再一次，在对代码一无所知的情况下提供更具体的例子很难。）

通过删除numpy数组释放内存

1 个答案: