Question

由于我在论文中使用TsFresh及其特征选择方法，我需要描述用于检测特征和标签之间相关性的方法。 TsFresh中用于“连续特征 - 连续标记”的过程是在scipy.stats中实现的Kendalltau等级系数。

这里的问题是kendall-tau正常用于序数据。如果它用于连续数据，那么找到一致的对将是不太可能的。所以我认为连续数据不知何故可能会被“收集”。以下代码片段是scipy函数的一部分，其中（在我看来），魔术发生了：

size = x.size
perm = np.argsort(y)  # sort on y and convert y to dense ranks
x, y = x[perm], y[perm]
y = np.r_[True, y[1:] != y[:-1]].cumsum(dtype=np.intp)

# stable sort on x and convert x to dense ranks
perm = np.argsort(x, kind='mergesort')
x, y = x[perm], y[perm]
x = np.r_[True, x[1:] != x[:-1]].cumsum(dtype=np.intp)

有人能够向我解释，这段代码是如何“收集”数据或“将其转换为密集排名”？

也许我完全走错了路，请纠正我。

scipy.stats.kendalltau对连续缩放数据做了什么？

0 个答案: