在此link总变差中,给出了两个概率分布之间的距离。
我试图在python中计算它。我有两个数据集,首先我从直方图中计算出它们的概率分布函数。然后我试图获得两个分布之间的最大差异。但它给我的价值很小。看来我在做错了。你能帮忙解决一下吗?
import scipy.stats as st
#original data has shape of [45222,1] and it is numpy array
#synthetic data has shape of [45222,1] and it is numpy array
summation = 0
minOriginal = min(original)
minGenerated = min(synthetic)
maxOriginal = max(original)
maxGenerated = max(synthetic)
minHist = min(minOriginal, minGenerated)
maxHist = max(maxOriginal, maxGenerated)
originalHist = np.histogram(original, range=(minHist, maxHist))
hist_dist1 = st.rv_histogram(originalHist)
generatedHist = np.histogram(synthetic, range=(minHist, maxHist))
hist_dist2 = st.rv_histogram(generatedHist)
x = np.linspace(minHist, maxHist, 45000)
summation += max(abs(hist_dist1.pdf(x)-hist_dist2.pdf(x)))