如何计算python中的总统计距离

时间:2017-08-19 20:51:06

标签: python statistics probability distribution

在此link总变差中,给出了两个概率分布之间的距离。

我试图在python中计算它。我有两个数据集,首先我从直方图中计算出它们的概率分布函数。然后我试图获得两个分布之间的最大差异。但它给我的价值很小。看来我在做错了。你能帮忙解决一下吗?

import scipy.stats as st
#original data has shape of [45222,1] and it is numpy array
#synthetic data has shape of [45222,1] and it is numpy array
summation = 0
minOriginal = min(original)
minGenerated = min(synthetic)

maxOriginal = max(original)
maxGenerated = max(synthetic)

minHist = min(minOriginal, minGenerated)
maxHist = max(maxOriginal, maxGenerated)

originalHist = np.histogram(original, range=(minHist, maxHist))
hist_dist1 = st.rv_histogram(originalHist)

generatedHist = np.histogram(synthetic, range=(minHist, maxHist))
hist_dist2 = st.rv_histogram(generatedHist)

x = np.linspace(minHist, maxHist, 45000)
summation += max(abs(hist_dist1.pdf(x)-hist_dist2.pdf(x)))

0 个答案:

没有答案