散射图颜色与距邻居的距离

时间:2018-05-21 15:00:18

标签: r cluster-analysis distance

我正在尝试根据tSNE结果创建一个基于其邻域密度的点着色的图 - 即该点周围的邻居数量和到邻居的距离。

给定tSNE结果坐标矩阵:

            [,1]       [,2]
  [1,] -4.2060515  3.1718312
  [2,] -4.2671476  5.6677296
  [3,] -3.1792470  3.5504695
  [4,] -3.2507526  4.7510075
  [5,] -4.5662531  3.3866132
  [6,] -5.0863544  3.1760014
  [7,] -4.7380256  5.5291478
  [8,] -5.0510355  5.0373626
  [9,] -4.3288679  4.3316772
 [10,] -5.2947188  4.6130757
[etc,] ...         ...

我希望能够根据上述标准对点进行着色。

但到目前为止,我能得到的就是这个,这只是欧几里德的平均距离,但这不正确:

this

理想情况下,我喜欢看起来类似于粗模型的东西,其中较近的点颜色比具有较少本地邻居的点颜色更深:this

d <- dist(best.tsne, method = "euclidean")`
d.scaled <- quick.scale(apply(as.matrix(d), 2, sum),
                        floor = 0, ceiling = 1)
ii <- cut(d.scaled,
          breaks = seq(min(d.scaled), max(d.scaled), len = 100),
          include.lowest = TRUE)
colors <- colorRampPalette(c("white", "blue"))(99)[ii]

我可以分配颜色等等,只需要能够计算得分。

1 个答案:

答案 0 :(得分:1)

有许多方法,但最常见的是使用二维内核或生成类似于您所做的测量,但更好地适应数据。

我举几个例子:

1 - Bidimensional内核:

# With kde2d {MASS}
library(MASS)
attach(geyser)
plot(duration, waiting, xlim = c(0.5,6), ylim = c(40,100))
f1 <- kde2d(duration, waiting, n = 50, lims = c(0.5, 6, 40, 100))
image(f1)

2 - 测量ad-hoc(1):

# Trimean 20%
apply(as.matrix(d), 2, mean, trim = 0.8)

3 - ad-hoc测量(2):

# Normalized inverse distance
apply(as.matrix(1/((1+d)/max(1+d))), 2, mean)

此致!!