请纠正我的熵代码

时间:2014-02-09 22:50:01

标签: vb.net string entropy

我正在编写一个代码来计算具有shannon熵的字符串的熵。

 Dim entropytext As String = Result.Text

    Dim theresult = entropytext.GroupBy(Function(o) o) _
        .Select(Function(o) New With {.Count = o.Count(), .Character = o.Key}) _
        .GroupBy(Function(o) o.Count, Function(o) o.Character) _
        .OrderByDescending(Function(o) o.Key)

    Dim totalEntropy As Double = 0
    Dim partialEntropy As Double
    Dim partialP As Double

    For Each item In theresult
        Console.Write(item.Key & " of chars: ")

        For Each character In item
            Console.Write(character)
        Next

        partialP = item.Key / entropytext.Count
        Console.Write(". p of each " & partialP & ", total p = " & item.Count * partialP)
        partialEntropy = partialP * Math.Log(partialP) * item.Count
        totalEntropy += partialEntropy
        Console.WriteLine()
    Next

    totalEntropy *= -1
    TextBox1.Text = totalEntropy & " Bits"
End Sub

数学:

Entropy = -∑(P_xlog(P_x))
P_x = N_x/∑(N_x)

其中 P_x 是字母 x 的概率,

N_x 是字母 x 的数量。

所以,

textbox1 ='AATC'

Entropy (textbox1)=-([2/4 log(2/4)]+[1/4 log (1/4)]+[1/4 log (1/4)])
= 1.0397

但这太低了......根据(http://www.shannonentropy.netmark.pl/)它应该是“1.5”。我究竟做错了什么?提前谢谢!!

基本上,它应该像这样工作......但我不能胜任尖锐的......

public static double ShannonEntropy(string s)
{
var map = new Dictionary<char, int>();
foreach (char c in s)
{
    if (!map.ContainsKey(c))
        map.Add(c, 1);
    else
        map[c] += 1;
}

double result = 0.0;
int len = s.Length;
foreach (var item in map)
{
    var frequency = (double)item.Value / len;
    result -= frequency * (Math.Log(frequency) / Math.Log(2));
}

return result;
}

1 个答案:

答案 0 :(得分:1)

这是C#代码到VB.NET的直接端口:

Public Shared Function ShannonEntropy(s As String) As Double
    Dim map = New Dictionary(Of Char, Integer)()
    For Each c As Char In s
        If Not map.ContainsKey(c) Then
            map.Add(c, 1)
        Else
            map(c) += 1
        End If
    Next

    Dim result As Double = 0.0
    Dim len As Integer = s.Length
    For Each item As var In map
        Dim frequency = CDbl(item.Value) / len
        result -= frequency * (Math.Log(frequency) / Math.Log(2))
    Next

    Return result
End Function

如果C#代码产生了您要查找的结果,则此代码将给出相同的结果。