我使用PYMC3来适应网球运动员的需要。使用贝叶斯拟合到Beta曲线提供ace率。每次代码循环播放器时,内存使用量会增加一点。我尝试在3个不同的表面上为400多名玩家做这件事,并且在大约200名玩家之后我的内存耗尽。我不明白为什么在每次循环迭代后内存都没有重新设置,因为我不认为我使用了之前循环迭代的信息。 我认为问题可能与Trace有关。我在某个地方看到了我不应该有trace = pm.sample(...)的建议,而只是pm.sample(...),然后在程序运行后获取该数据。我不确定如何实施该修复,我希望有一个更直接的解决方案,我认为这将是一个相当普遍的问题(虽然我没有看到问题它很多在线)。
代码的相关位如下所示。在此先感谢您的帮助。
import pymc3 as pm
prior_parameters = beta.fit(chart_data, floc = 0, fscale = 1)
prior_a, prior_b = prior_parameters[0:2]
for i in range(server_by_surface_pct.shape[0]):
#srv_count is number of serves taken by player i on surface j
srv_count = pivot_srv_count.iat[i, j]
#Go to next iteration of loop if no serves for player i on surface j
if np.isnan(srv_count):
continue
#ace_pct is the percent of serves from player i on surface j that are aces
ace_pct = server_by_surface_pct.iat[i,j]
#calculate ace_count (number of aces) by player i on surface j
ace_count = round(srv_count*ace_pct,0)
#zero aces is possible so replace NANs with ZERO
if np.isnan(ace_count):
ace_count = 0.0
#pm = PYMC3 -- this is the Bayesian fitting model
with pm.Model() as model:
theta_prior = pm.Beta('prior', prior_a, prior_b)
observations = pm.Binomial('obs',n = srv_count, p = theta_prior, observed = ace_count)
start = pm.find_MAP()
step = pm.NUTS(scaling=start)
trace = pm.sample(1000, step=step, start=start, progressbar=True)
#mean of the trace is the new fitted serve percent for player i on surface j
server_by_surface_pct_fitted.iat[i,j] = np.mean(trace['prior'])