python:sns distplot区域重叠

时间:2018-04-17 14:06:37

标签: python seaborn

如何获得2个sns.distplots的重叠区域?

除了平均值的差异(如下所示)之外,我想添加一个描述(标准化)分布差异的数字(例如,2个分布可能具有相同的均值,但如果它们不正常则仍然看起来非常不同)。

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

x1 = np.random.normal(size=2000)
x2 = np.random.normal(size=1000)+1

sns.distplot(x1, hist=False, kde=True, color="r", norm_hist=True)
sns.distplot(x2, hist=False, kde=True, color="b", norm_hist=True)

m1 = x1.mean()
m2 = x2.mean()

plt.title("m1={:2.2f}, m2={:2.2f} (diffInMean={:2.2f})".format(m1, m2, m1-m2))

plt.show(block=True)

1 个答案:

答案 0 :(得分:0)

如果有人感兴趣的话:我现在用分布的整数来近似它(遗憾的是不是我正在寻找的1-liner):

data1 = np.random.normal(size=9000)
data2 = np.random.normal(size=5000, loc=0.5, scale=1.5)
num_bins = 100

xmin = min(data1.min(), data2.min())
xmax = max(data1.max(), data2.max())
bins = np.linspace(xmin, xmax, num_bins)
weights1 = np.ones_like(data1) / float(len(data1))
weights2 = np.ones_like(data2) / float(len(data2))

hist_1 = np.histogram(data1, bins, weights=weights1)[0]
hist_2 = np.histogram(data2, bins, weights=weights2)[0]

tvd = 0.5*sum(abs(hist_1 - hist_2))
print("overlap: {:2.2f} percent".format((1-tvd)*100))

plt.figure()
ax = plt.gca()
ax.hist(data1, bins, weights=weights1, color='red', edgecolor='white', alpha=0.5)[0]
ax.hist(data2, bins, weights=weights2, color='blue', edgecolor='white', alpha=0.5)[0]
plt.show()