我正在尝试(在python中)将一系列任意数量的高斯函数(通过仍在改进的简单算法确定)拟合到数据集。对于我当前的样本数据集,我有174个高斯函数。我有一个适合的程序,但它基本上是复杂的猜测和检查,并消耗所有可用的4GB内存。
有没有办法用scipy或numpy中的东西来实现这个目标?
以下是我正在尝试使用的内容,其中wavelength []是x坐标列表,fluxc []是y坐标列表:
#Pick a gaussian for repeat in range(0,2): for f in range(0,len(centroid)): #Iterate over every other gaussian for i in range(0,len(centroid)): if i!= f: #For every wavelength, for w in wavelength: #Append the value of each to an list, called others others.append(height[i]*math.exp(-(w-centroid[i])**2/(2*width[i]**2))) #Optimize the centroid of the current gaussian prev = centroid[f] best = centroid[f] #Pick an order of magnitude for p in range (int(round(math.log10(centroid[i]))-3-repeat),int(round(math.log10(centroid[i])))-6-repeat,-1): #Pick a value of that order of magnitude for m in range (-5,9): #Change the value of the current item centroid[f] = prev + m * 10 **(p) #Increment over all wavelengths, make a list of the new values variancy = 0 residual = 0 test = [] #Increment across every wavelength and evaluate if this change gets R^2 any larger for k in range(0,len(wavelength)): test.append(height[i]*math.exp(-(wavelength[k]-centroid[f])**2/(2*width[i]**2))) residual += (test[k]+others[k]-cflux[k])**2 variancy += (test[k]+others[k]-avgcflux)**2 rsquare = 1-(residual/variancy) #Check the R^2 value for this new fit if rsquare > bestr: bestr = rsquare best = centroid[f] centroid[f] = best #Optimize the height of the current gaussian prev = height[f] best = height[f] #Pick an order of magnitude for p in range (int(round(math.log10(height[i]))-repeat),int(round(math.log10(height[i])))-3-repeat,-1): #Pick a value of that order of magnitude for m in range (-5,9): #Change the value of the current item height[f] = prev + m * 10 **(p) #Increment over all wavelengths, make a list of the new values variancy = 0 residual = 0 test = [] #Increment across every wavelength and evaluate if this change gets R^2 any larger for k in range(0,len(wavelength)): test.append(height[f]*math.exp(-(wavelength[k]-centroid[i])**2/(2*width[i]**2))) residual += (test[k]+others[k]-cflux[k])**2 variancy += (test[k]+others[k]-avgcflux)**2 rsquare = 1-(residual/variancy) #Check the R^2 value for this new fit if rsquare > bestr: bestr = rsquare best = height[f] height[f] = best #Optimize the width of the current gaussian prev = width[f] best = width[f] #Pick an order of magnitude for p in range (int(round(math.log10(width[i]))-repeat),int(round(math.log10(width[i])))-3-repeat,-1): #Pick a value of that order of magnitude for m in range (-5,9): if prev + m * 10**(p) == 0: m+=1 #Change the value of the current item width[f] = prev + m * 10 **(p) #Increment over all wavelengths, make a list of the new values variancy = 0 residual = 0 test = [] #Increment across every wavelength and evaluate if this change gets R^2 any larger for k in range(0,len(wavelength)): test.append(height[i]*math.exp(-(wavelength[k]-centroid[i])**2/(2*width[f]**2))) residual += (test[k]+others[k]-cflux[k])**2 variancy += (test[k]+others[k]-avgcflux)**2 rsquare = 1-(residual/variancy) #Check the R^2 value for this new fit if rsquare > bestr: bestr = rsquare best = width[f] width[f] = best count += 1 #print '{} of {} peaks optimized, iteration {} of {}'.format(f+1,len(centroid),repeat+1,2) complete = round(100*(count/(float(len(centroid))*2)),2) print '{}% completed'.format(complete) print 'New R^2 = {}'.format(bestr)
答案 0 :(得分:2)
是的,使用scipy可能会更好(更容易)。但首先,将代码重构为较小的函数;它只是让你更容易阅读和理解正在发生的事情。
至于内存消耗:你可能在某个地方过度扩展列表(others
是候选者:我从未看到它被清除(或初始化!),而它被四重循环填充) 。那个,或者你的数据就那么大(在这种情况下你真的应该使用numpy数组,只是为了加快速度)。我不知道,因为你引入了各种变量而没有对大小有所了解(wavelengths
有多大?others
有多大?数据初始化的内容和位置数组?)
另外,拟合174高斯只是有点疯狂;或者研究另一种确定你想要从数据中获取的东西的方法,或者分解。从wavelengths
变量开始,您似乎正在尝试在高分辨率光谱中拟合线条;也许隔离大部分线并分别拟合这些孤立的组更好。如果它们都重叠,我怀疑任何正常的拟合技术都会对你有所帮助。
最后,也许像pandas这样的包可以提供帮助(例如,computation子包。)
也许是最后一次,因为我看到很多可以在代码中得到改进的东西。在某些时候codereview也可能有用。虽然现在我猜你的内存使用量是问题最多的部分。