R Interclass距离矩阵

时间:2016-08-22 23:30:16

标签: r distance-matrix

这个问题是how to extract intragroup and intergroup distances from a distance matrix? in R的后续行动。在那个问题中,他们首先计算了所有点的距离矩阵,然后简单地提取了类间距离矩阵。我有一种情况,我想绕过初始计算并向右跳到提取,即我想直接计算类间距离矩阵。通过调整链接示例,假设我在名为df的数据框中有一些数据:

values<-c(0.002,0.3,0.4,0.005,0.6,0.2,0.001,0.002,0.3,0.01)
class<-c("A","A","A","B","B","B","B","A","B","A")
df<-data.frame(values, class)

我想要的是距离矩阵:

    1    2    3    8   10
4 .003 .295 .395 .003 .005
5 .598 .300 .200 .598 .590
6 .198 .100 .200 .198 .190
7 .001 .299 .399 .001 .009
9 .298 .000 .100 .298 .290

R中是否已经存在一种优雅而快速的方法来执行此操作?

编辑在收到上述1D案例的良好解决方案后,我想到了一个额外的问题:如果更高维度的案例,请说df是这样的:< / p>

values1<-c(0.002,0.3,0.4,0.005,0.6,0.2,0.001,0.002,0.3,0.01)
values2<-c(0.001,0.1,0.1,0.001,0.1,0.1,0.001,0.001,0.1,0.01)
class<-c("A","A","A","B","B","B","B","A","B","A")
df<-data.frame(values1, values2, class)

我有兴趣再次获得班级B中各点之间欧几里德距离的矩阵与班级A中的点。

2 个答案:

答案 0 :(得分:3)

对于一般n - 维欧氏距离,我们可以利用方程(不是R,而是代数):

square_dist(b,a) = sum_i(b[i]*b[i]) + sum_i(a[i]*a[i]) - 2*inner_prod(b,a)

其中总和超过a的向量bi=[1,n]的维度。此处ab是来自AB的一对。这里的关键是这个等式可以写成AB中所有对的矩阵方程。

在代码中:

## First split the data with respect to the class
n <- 2   ## the number of dimensions, for this example is 2
tmp <- split(df[,1:n], df$class)

d <- sqrt(matrix(rowSums(expand.grid(rowSums(tmp$B*tmp$B),rowSums(tmp$A*tmp$A))),
                 nrow=nrow(tmp$B)) - 
          2. * as.matrix(tmp$B) %*% t(as.matrix(tmp$A)))

注意:

  1. 分别为rowSums中的sum_i(b[i]*b[i])sum_i(a[i]*a[i]) b内的B计算aA
  2. expand.grid然后生成BA之间的所有对。
  3. rowSums计算所有这些对的sum_i(b[i]*b[i]) + sum_i(a[i]*a[i])
  4. 然后将此结果重新整形为matrix。请注意,此矩阵的行数是您请求的类B的点数。
  5. 然后减去所有对的内积的两倍。这个内积可以写成矩阵乘法tmp$B %*% t(tmp$A),为了清楚起见我将强制省略到矩阵。
  6. 最后,取平方根。
  7. 将此代码与您的数据一起使用:

    print(d)
    ##          1         2         3         8         10
    ##4 0.0030000 0.3111688 0.4072174 0.0030000 0.01029563
    ##5 0.6061394 0.3000000 0.2000000 0.6061394 0.59682493
    ##6 0.2213707 0.1000000 0.2000000 0.2213707 0.21023796
    ##7 0.0010000 0.3149635 0.4110985 0.0010000 0.01272792
    ##9 0.3140143 0.0000000 0.1000000 0.3140143 0.30364453
    

    请注意,此代码适用于任何n > 1。我们可以通过将n设置为1而不执行内部rowSums来恢复之前的1-d结果(因为tmp$A和{{1}中只有一列}}):

    tmp$B

答案 1 :(得分:2)

这是一种尝试,通过生成每个组合,然后简单地从每个值中获取差异:

abs(matrix(Reduce(`-`, expand.grid(split(df$values, df$class))), nrow=5, byrow=TRUE))
#      [,1]  [,2]  [,3]  [,4]  [,5]
#[1,] 0.003 0.295 0.395 0.003 0.005
#[2,] 0.598 0.300 0.200 0.598 0.590
#[3,] 0.198 0.100 0.200 0.198 0.190
#[4,] 0.001 0.299 0.399 0.001 0.009
#[5,] 0.298 0.000 0.100 0.298 0.290