我需要计算每个关键字在字符串中重复出现的次数,并按最高数字排序。 为此目的,.NET代码中最快的算法是什么?
答案 0 :(得分:6)
编辑:以下代码对具有计数的唯一令牌进行分组
string[] target = src.Split(new char[] { ' ' });
var results = target.GroupBy(t => new
{
str = t,
count = target.Count(sub => sub.Equals(t))
});
这终于开始让我更有意义了......
编辑:以下代码导致计数与目标子字符串相关:
string src = "for each character in the string, take the rest of the " +
"string starting from that character " +
"as a substring; count it if it starts with the target string";
string[] target = {"string", "the", "in"};
var results = target.Select((t, index) => new {str = t,
count = src.Select((c, i) => src.Substring(i)).
Count(sub => sub.StartsWith(t))});
结果现在是:
+ [0] { str = "string", count = 4 } <Anonymous Type>
+ [1] { str = "the", count = 4 } <Anonymous Type>
+ [2] { str = "in", count = 6 } <Anonymous Type>
以下原始代码:
string src = "for each character in the string, take the rest of the " +
"string starting from that character " +
"as a substring; count it if it starts with the target string";
string[] target = {"string", "the", "in"};
var results = target.Select(t => src.Select((c, i) => src.Substring(i)).
Count(sub => sub.StartsWith(t))).OrderByDescending(t => t);
调试器的结果(需要额外的逻辑来包含匹配的字符串及其计数):
- results {System.Linq.OrderedEnumerable<int,int>}
- Results View Expanding the Results View will enumerate the IEnumerable
[0] 6 int
[1] 4 int
[2] 4 int
答案 1 :(得分:4)
Dunno谈得最快,但Linq可能是最容易理解的:
var myListOfKeywords = new [] {"struct", "public", ...};
var keywordCount = from keyword in myProgramText.Split(new []{" ","(", ...})
group by keyword into g
where myListOfKeywords.Contains(g.Key)
select new {g.Key, g.Count()}
foreach(var element in keywordCount)
Console.WriteLine(String.Format("Keyword: {0}, Count: {1}", element.Key, element.Count));
你可以用非Linq-y方式写这个,但基本前提是一样的;将字符串拆分为单词,并计算每个感兴趣的单词的出现次数。
答案 2 :(得分:2)
简单算法:将字符串拆分为单词数组,遍历此数组,并将每个单词的计数存储在哈希表中。完成后按计数排序。
答案 3 :(得分:1)
您可以将字符串分解为字符串集合,每个字对应一个字符串,然后对集合执行LINQ查询。虽然我怀疑它会是最快的,但它可能比正则表达式更快。</ p>