Question

惠，

我想计算一个相似度指标，以便在行ar＆＃39; simialr和-1，当他们不是。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "<meta name=\"pubdate\" content=\"2012-08-30\" />";
            XElement meta = XElement.Parse(input);
            DateTime output = (DateTime)meta.Attribute("content");
        }
    }
}

我想得到一个这样的矩阵（或诊断上的NA，它并不重要）

我尝试使用功能simil的代理包

  dataR<- read.table(text='
    echant  espece
    ech1    esp1
    ech2    esp2
    ech3    esp2
    ech4    esp3
    ech5    esp3
    ech6    esp4
    ech7    esp4', header=TRUE)

但结果并不完全符合我的要求。 1）例如，ech 2和3之间的相似性为0.5，对角线为0;当没有相似性时它也是0。 2）ech的标签丢失 3）...另外，我无法以.csv格式保存它。

有人有建议吗？非常感谢！

Answer 1

帖子中描述的矩阵可以通过以下方式获得：

same.mat <- outer(dataR$espece, dataR$espece, "==") * 2 - 1

要按照帖子中的描述为列和行指定名称，可以使用rownames和colnames。

rownames(same.mat) <- colnames(same.mat) <- dataR$echant
> same.mat
#     ech1 ech2 ech3 ech4 ech5 ech6 ech7
#ech1    1   -1   -1   -1   -1   -1   -1
#ech2   -1    1    1   -1   -1   -1   -1
#ech3   -1    1    1   -1   -1   -1   -1
#ech4   -1   -1   -1    1    1   -1   -1
#ech5   -1   -1   -1    1    1   -1   -1
#ech6   -1   -1   -1   -1   -1    1    1
#ech7   -1   -1   -1   -1   -1    1    1

另一种方法可能是：

same.mat <- (as.matrix(dist(as.numeric(dataR$espece)))==0)*2 - 1
rownames(same.mat) <- colnames(same.mat) <- dataR$echant

Answer 2

毫无疑问，更紧凑的方法可以做到这一点：

library(tidyr)
same <- function(x) { ifelse(is.na(x), -1, 1) }
spread(dataR, espece, espece) %>% 
  mutate_at(vars(-echant), funs(same))
##   echant esp1 esp2 esp3 esp4
## 1   ech1    1   -1   -1   -1
## 2   ech2   -1    1   -1   -1
## 3   ech3   -1    1   -1   -1
## 4   ech4   -1   -1    1   -1
## 5   ech5   -1   -1    1   -1
## 6   ech6   -1   -1   -1    1
## 7   ech7   -1   -1   -1    1

相似性指数

2 个答案: