在R中,如何计算两个字符串向量之间的KL距离?

时间:2017-04-24 20:50:09

标签: r

如果我有两个字符串向量,如:

> list1 = c("cat", "dog", "cat", "rabbit", "dog", "cat")
> list2 = c("dog", "rabbit", "dog", "mouse", "dog", "rabbit", "cat")

我可以为每个人分配。例如:

> dist1 = table(list1)/length(list1)
> dist2 = table(list2)/length(list2)
> dist1; dist2

list1
      cat       dog    rabbit 
0.5000000 0.3333333 0.1666667 
list2
      cat       dog     mouse    rabbit 
0.1428571 0.4285714 0.1428571 0.2857143 

如何计算这两个分布之间的KL距离? (使用dist2作为基线。)

我见过的KL函数(例如,kl.dist)需要相同长度的向量。

1 个答案:

答案 0 :(得分:0)

以下将生成一个数据框,其中一列包含每个矢量字符串的分布:

library(dplyr)

list1 <- c("cat", "dog", "cat", "rabbit", "dog", "cat")
list2 <- c("dog", "rabbit", "dog", "mouse", "dog", "rabbit", "cat")

dist1 <- table(list1)/length(list1)
dist2 <- table(list2)/length(list2)

BothDist <- full_join(as.data.frame(dist1),as.data.frame(dist2), by = c("list1" = "list2")) 
BothDist[is.na(BothDist)] <- 0

BothDist