我有一个数据帧df
,我想根据2列计算条件熵。
df<-structure(list(Col1 = structure(1:10, .Label = c("A", "B", "C",
"D", "E", "F", "G", "H", "I", "J"), class = "factor"), Col2 = c(1,
4, 5, 3, 6, 3, 1, 3, 6, 7)), .Names = c("Col1", "Col2"), row.names = c(NA,
-10L), class = "data.frame")
我知道如何计算熵,可以使用以下代码对Col2中随机选择的值说H(X):
vec<-as.vector(df$Col2)
freq <- table(vec)/length(vec)
vector1 <- as.data.frame(freq)[,2]
#Entropy
-sum(vector1 * log2(vector1))
现在我将如何计算基于Col 1和Col2的条件熵。假设Y表示Col1,因此我想计算H(X | Y)