我正在尝试在python 3中重新创建这个密度图:math.stackexchange.com/questions/845424/the-expected-outcome-of-a-random-game-of-chess
End Goal: I need my density plot to look like this
蓝色曲线下面积等于红色,绿色和紫色曲线的面积,因为不同的结果(Draw,Black wins和White wins)是总数的子集(全部)。
我如何让python实现并相应地绘制它?
这是1000次模拟后的results_df的.csv文件pastebin.com/YDVMx2DL
from matplotlib import pyplot as plt
import seaborn as sns
black = results_df.loc[results_df['outcome'] == 'Black']
white = results_df.loc[results_df['outcome'] == 'White']
draw = results_df.loc[results_df['outcome'] == 'Draw']
win = results_df.loc[results_df['outcome'] != 'Draw']
Total = len(results_df.index)
Wins = len(win.index)
PercentBlack = "Black Wins ≈ %s" %('{0:.2%}'.format(len(black.index)/Total))
PercentWhite = "White Wins ≈ %s" %('{0:.2%}'.format(len(white.index)/Total))
PercentDraw = "Draw ≈ %s" %('{0:.2%}'.format(len(draw.index)/Total))
AllTitle = 'Distribution of Moves by All Outcomes (nSample = %s)' %(workers)
sns.distplot(results_df.moves, hist=False, label = "All")
sns.distplot(black.moves, hist=False, label=PercentBlack)
sns.distplot(white.moves, hist=False, label=PercentWhite)
sns.distplot(draw.moves, hist=False, label=PercentDraw)
plt.title(AllTitle)
plt.ylabel('Density')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()
上面的代码生成没有权重的密度曲线,我真的需要弄清楚如何相应地生成密度曲线权重以及在图例中保留我的标签
density curves, no weights; help
我还尝试了频率直方图,正确地缩放了分布高度,但我宁愿保持4条曲线叠加在一起,以便清洁"看...... 我不喜欢这个频率图,但这是我目前的修复方法。
results_df.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = "All")
draw.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentDraw)
white.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentWhite)
black.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentBlack)
plt.title(AllTitle)
plt.ylabel('Frequency')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()
如果有人可以编写python 3代码,输出第一个带有4个密度曲线且具有正确子集权重的图,并保留显示百分比的自定义图例,那将非常感激。
一旦使用正确的子集权重绘制密度曲线,我也对找到每条密度曲线的最大点坐标中的python 3代码感兴趣,该代码显示了一旦我缩放,最大移动频率它高达500,000次迭代。
由于
答案 0 :(得分:1)
你需要小心。您生成的图是正确的。显示的所有曲线都是基础分布的概率密度函数。
在您想要的图中,只有标记为“全部”的曲线才是概率密度函数。其他曲线不是。
在任何情况下,您都需要自己计算核密度估计值,如果您想要按照所需的图表所示进行缩放。这可以使用scipy.stats.gaussial_kde()
完成。
为了重现所需的情节,我看到两个选项。
计算所有相关案例的kde,并根据样本数量进行缩放。
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats
a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)
kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)
both = np.hstack((a,b))
kdeboth = scipy.stats.gaussian_kde(both)
grid = np.arange(500)
#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(both)))
wb = kdeb(grid)*(len(b)/float(len(both)))
print "a.sum ", wa.sum()
print "b.sum ", wb.sum()
print "total.sum ", kdeb(grid).sum()
fig, ax = plt.subplots()
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf")
plt.legend()
plt.show()
计算所有个案的kde,将其总和标准化以获得总数。
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats
a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)
kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)
grid = np.arange(500)
#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(a)+len(b)))
wb = kdeb(grid)*(len(b)/float(len(a)+len(b)))
total = wa+wb
fig, ax = plt.subplots(figsize=(5,3))
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, total, color="crimson", lw=2, label = "pdf")
plt.legend()
plt.show()