Question

我有两个变量的两列数据框是因素：

DF

PLOT  INTERACTION 
 A    interact_type_1
 A    interact_type_2
 B    interact_type_3
 B    interact_type_4 
 C    interact_type_1
 D    interact_type_4
 E    interact_type_1
 E    interact_type_2
 E    interact_type_3
 E    interact_type_4

我需要一个成对矩阵，其中nrows和mcolumns是变量1（PLOTS）的唯一级别。矩阵填充将包括PLOT级别的每个组合之间的INTERACTION匹配的计数。由于它是一个相似性矩阵，因此只有1/2的矩阵填充，因此相同的PLOTS和1/2的矩阵将填充NA。在此示例中，输出矩阵如下所示：

output


   A   B    C    D    E

A NA   NA   NA   NA   NA

B 0   NA    NA   NA   NA

C 1   0    NA    NA   NA

D 0   1    0    NA    NA

E 2   2    1    1     NA

我尝试将其从长格式更改为宽格式，然后使用循环：

 df<- spread(df, df$PLOT, df$INTERACTION) 


 similarity.matrix<-matrix(nrow=ncol(F.data),ncol=ncol(F.data))


 for( in 1:ncol(F.data)){
  matches<-F.data[,col]==F.data
  match.counts<-colSums(matches)
  match.counts[col]<-0 # Set the same column comparison to zero.
  similarity.matrix[,col]<-match.counts
   }

但我收到第一行错误：错误：列规范无效。

感谢您的时间和帮助！谢谢。

Answer 1

你可以这样做：

x = xtabs(~PLOT+INTERACTION,d)
        INTERACTION
    PLOT interact_type_1 interact_type_2 interact_type_3 interact_type_4
       A               1               1               0               0
       B               0               0               1               1
       C               1               0               0               0
       D               0               0               0               1
       E               1               1               1               1

使用PLOT找出combn中两个的组合：

n = length(unique(d$PLOT))
c = combn(1:n,2)

然后构建你的矩阵并填充它的下半部分：

m = matrix(nrow=n,ncol=n)
## for each possible combination of two present in c, we find for the corresponding rows in x how many 1s they have in common using sum(x[y[1],]*x[y[2],])
m[lower.tri(m)] = apply(c,2,function(y) sum(x[y[1],]*x[y[2],]))

返回：

      [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA   NA   NA   NA
[2,]    0   NA   NA   NA   NA
[3,]    1    0   NA   NA   NA
[4,]    0    1    0   NA   NA
[5,]    2    2    1    1   NA

如何创建具有匹配条目计数的成对矩阵，以便比较数据框中所有级别的因子？

1 个答案: