计算字符串中的单词频率(最重要的单词),不包括关键字

时间:2010-08-31 09:34:58

标签: vb.net linq count word-count word-frequency

我想计算字符串中单词的频率(不包括某些关键字)并将它们排序为DESC。那么,我该怎么做呢?

在以下字符串中......

This is stackoverflow. I repeat stackoverflow.

排除关键字的位置

ExKeywords() ={"i","is"}

输出应该像

stackoverflow  
repeat         
this           

P.S。没有!我不是重新设计谷歌! :)

2 个答案:

答案 0 :(得分:4)

string input = "This is stackoverflow. I repeat stackoverflow.";
string[] keywords = new[] {"i", "is"};
Regex regex = new Regex("\\w+");

foreach (var group in regex.Matches(input)
    .OfType<Match>()
    .Select(c => c.Value.ToLowerInvariant())
    .Where(c => !keywords.Contains(c))
    .GroupBy(c => c)
    .OrderByDescending(c => c.Count())
    .ThenBy(c => c.Key))
{
    Console.WriteLine(group.Key);
}

答案 1 :(得分:0)

string s = "This is stackoverflow. I repeat stackoverflow.";
string[] notRequired = {"i", "is"};

var myData =
    from word in s.Split().Reverse()
    where (notRequired.Contains(word.ToLower()) == false)
    group word by word into g
    select g.Key;

foreach(string item in myData)
    Console.WriteLine(item);