我有一个包含683个样本和9个特征的数据集。我想比较每列的两个数据集的KLDivergence。
originalAttribute = np.asarray(originalData[:, i]).reshape(row)
histOriginal = np.histogram(originalAttribute, bins=binSize)
hist_original_dist = st.rv_histogram(histOriginal)
generatedAttribute = np.asarray(generatedData[:, i]).reshape(row)
histGenerated = np.histogram(generatedAttribute, bins=binSize)
hist_generated_dist = st.rv_histogram(histGenerated)
x = np.linspace(-5, 5, 100)
summation += st.entropy(hist_original_dist.pdf(x), hist_generated_dist.pdf(x))
它返回不定式,但我认为我做错了。在hist_original_dist.pdf(x)
函数中,我有一些值,例如2.65,对于python中的pdf不应该存在