如何在字符串中找到所有重复的字符序列?

时间:2017-05-15 20:59:46

标签: c# string foreach sequence identifier

您好我发现很难在我的代码中进行此修改。目前它可以识别重复的单词,但重复的字符序列呢?

例如,如果用户输入:其余是测试

程序将输出:MOST COMMON:" est" (但我不能让这个工作)

或者如果用户输入:相同的游戏

程序将输出:MOST COMMON:" ame"

必须区分大小写(" XY不能被视为与xY或Xy&#34相同;)。这是我目前的代码:

  string words;
    Console.WriteLine("Input string:");
    words = Console.ReadLine();
    var results = words.Split(' ').Where(x => x.Length > 3)
                                  .GroupBy(x => x)
                                  .Select(x => new { Count = x.Count(), Word = x.Key })
                                  .OrderByDescending(x => x.Count);

    foreach (var item in results)


    Console.WriteLine(String.Format("{0} occured {1} times", item.Word, item.Count));
    Console.WriteLine("Most common = " + results.First());
    Console.WriteLine("Least common =  "+ results.Last());

1 个答案:

答案 0 :(得分:2)

分为单词,假设最小长度为3个字符,找到最常见但最长的公共序列:

var results = words.Split(' ')
                   .SelectMany(w => Enumerable.Range(3, Math.Max(0, w.Length - 2)).Select(n => w.Substring(w.Length - n, n)))
                   .GroupBy(pw => pw)
                   .Select(pwg => new { Common = pwg.Key, Count = pwg.Count() })
                   .OrderByDescending(cc => cc.Count)
                   .ThenByDescending(cc => cc.Common.Length)
                   .Take(1);