我需要一个解决方案,将文本分类分为多个类别。这种方法似乎运作良好:http://www.codeproject.com/Articles/14270/A-Naive-Bayesian-Classifier-in-C
我只有一个问题与返回的分数有关。目前,最高分意味着最适合该类别。
但我想获得每个类别的百分比值。
这是分数计算的一部分:
/// <summary>
/// Classifies a text<\summary>
/// <returns>
/// returns classification values for the text, the higher, the better is the match.</returns>
public Dictionary<string, double> Classify(System.IO.StreamReader tr)
{
Dictionary<string, double> score = new Dictionary<string, double>();
foreach (KeyValuePair<string, ICategory> cat in m_Categories)
{
score.Add(cat.Value.Name, 0.0);
}
EnumerableCategory words_in_file = new EnumerableCategory("", m_ExcludedWords);
words_in_file.TeachCategory(tr);
foreach (KeyValuePair<string, PhraseCount> kvp1 in words_in_file)
{
PhraseCount pc_in_file = kvp1.Value;
foreach (KeyValuePair<string, ICategory> kvp in m_Categories)
{
ICategory cat = kvp.Value;
int count = cat.GetPhraseCount(pc_in_file.RawPhrase);
if (0 < count)
{
score[cat.Name] += System.Math.Log((double)count / (double)cat.TotalWords);
}
else
{
score[cat.Name] += System.Math.Log(0.01 / (double)cat.TotalWords);
}
System.Diagnostics.Trace.WriteLine(pc_in_file.RawPhrase.ToString() + "(" +
cat.Name + ")" + score[cat.Name]);
}
}
foreach (KeyValuePair<string, ICategory> kvp in m_Categories)
{
ICategory cat = kvp.Value;
score[cat.Name] += System.Math.Log((double)cat.TotalWords / (double)this.CountTotalWordsInCategories());
}
return score;
}
感谢您的帮助!
答案 0 :(得分:1)
如果我理解正确,您需要对Values
中的所有Dictionary
求和,它会给您100%。然后将每个Value
除以收到的总和。
在return score;
之前插入此代码:
double sum = score.Values.Sum();
foreach (var name in score.Keys)
{
score[name] /= sum;
}