它叫什么 - 从关键字列表中获取最常见的单词段?

时间:2012-10-24 15:06:48

标签: c# .net

假设我有一个关键字列表

free numerology compatibility
numerology calculator free
free numerology report
numerology reading
free numerology reading
etc...

通过什么c#算法或它叫什么,所以我可以进一步研究它,当我想得到以下结果?

6 instances of "numerology"
3 instances of "free numerology"
2 instances of "numerology reading"
1 instance of "numerology compatibility"
1 instance of "numerology calculator"
etc...

2 个答案:

答案 0 :(得分:0)

您可以循环显示单词数组并使用字典存储计数。

e.g。

Dictionary d = new Dictionary<string, int>();

foreach (string word in wordList)
{
    if (d.ContainsKey(word))
    {
       d[word]++;
    }
    else
    {
       d[word] = 1;
    }
}

答案 1 :(得分:0)

您正在寻找的主题名为术语频率分析词频分析。以下代码可以为您提供每个单词的频率。找到给定短语的频率也很容易,但对整个文档进行分析,找到频率大于1的术语序列有点复杂。

void Analyze(ref String InputText, ref Dictionary<string, int> WordFreq)
{
    string []Words = InputText.Split(' ');

    for (int i = 0; i < Words.Length; i++)
    {
        if (WordFreq.ContainsKey(Words[i]) == false)
            WordFreq.Add(Words[i], 1);
        else
        {
             WordFreq[Words[i]]++;
        }
    }
}

void DoWork()
{
    string InputText = "free numerology compatibility numerology calculator free free numerology report numerology reading free numerology reading";
    Dictionary<string, int> WordFreq = new Dictionary<string,int>();

    Analyze(ref InputText,ref WordFreq);

    string result = null;
    foreach (KeyValuePair<string, int> pair in WordFreq)
    {
        result += pair.Value + " Instances of " + pair.Key + "\r\n";
    }

    MessageBox.Show(result);
}

private void Form1_Load(object sender, EventArgs e)
{
    DoWork();
}