从C#中的重复字符串确定唯一字符串

时间:2012-12-03 20:53:04

标签: c# .net string string-parsing

我需要开发一种有效的算法来确定给定带有重复内容的字符串的唯一(重复)字符串(并且只重复内容)......

例如:

"AbcdAbcdAbcdAbcd" => "Abcd"

"Hello" => "Hello"

我在提出一种效率相当高的算法时遇到了麻烦;任何意见都将不胜感激。

澄清:我希望最短的字符串,当重复足够的次数时,等于总字符串

6 个答案:

答案 0 :(得分:3)

    private static string FindShortestRepeatingString(string value)
    {
        if (value == null) throw new ArgumentNullException("value", "The value paramter is null.");

        for (int substringLength = 1; substringLength <= value.Length / 2; substringLength++)
            if (IsRepeatingStringOfLength(value, substringLength))
                return value.Substring(0, substringLength);
        return value;
    }

    private static bool IsRepeatingStringOfLength(string value, int substringLength)
    {
        if (value.Length % substringLength != 0)
            return false;
        int instanceCount = value.Length / substringLength;
        for (int characterCounter = 0; characterCounter < substringLength; characterCounter++)
        {
            char currentChar = value[characterCounter];
            for (int instanceCounter = 1; instanceCounter < instanceCount; instanceCounter++)
                if (value[instanceCounter * substringLength + characterCounter] != currentChar)
                    return false;
        }
        return true;
    }

答案 1 :(得分:1)

这样的事情:

public string ShortestRepeating(string str)
{
    for(int len = 1; len <= str.Length/2; len++)
    {
        if (str.Length % len == 0)
        {
            sub = str.SubString(0, len);
            StringBuilder builder = new StringBuilder(str.Length)
            while(builder.Length < str.Length)
                builder.Append(sub);
            if(str == builder.ToString())
                return sub;
        }
    }
    return str;
}

这只是从头开始查看子字符串,然后重复它们以查看它们是否匹配。它也会跳过任何长度不均匀分配到原始字符串长度的长度,并且只会超过长度/ 2,因为任何超出该长度的东西都不能作为重复的候选对象。

答案 2 :(得分:1)

也许这可行:

static string FindShortestSubstringPeriod(string input)
{
  if (string.IsNullOrEmpty(input))
    return input;

  for (int length = 1; length <= input.Length / 2; ++length)
  {
    int remainder;
    int repetitions = Math.DivRem(input.Length, length, out remainder);        
    if (remainder != 0)
      continue;
    string candidate = input.Remove(length);
    if (String.Concat(Enumerable.Repeat(candidate, repetitions)) == input)
      return candidate;
  }
  return input;
}

答案 3 :(得分:0)

我会选择这样的事情:

private static string FindRepeat(string str)
{
    var lengths = Enumerable.Range(1, str.Length - 1)
        .Where(len => str.Length % len == 0);

    foreach (int len in lengths)
    {
        bool matched = true;

        for (int index = 0; matched && index < str.Length; index += len)
        {
            for (int i = index; i < index + len; ++i)
            {
                if (str[i - index] != str[i])
                {
                    matched = false;
                    break;
                }
            }
        }

        if (matched)
            return str.Substring(0, len);
    }

    return str;
}

答案 4 :(得分:0)

试试这个正则表达式:

^(\w*?)\1*$

捕获尽可能少的字符,其中捕获的序列(仅捕获的序列)重复0次或更多次。根据雅各布的回答,你可以随后从捕获中获得最短匹配的文本。

答案 5 :(得分:-1)

您可以使用带有反向引用的正则表达式。

Match match = Regex.Match(@"^(.*?)\0*$");
String smallestRepeat = match.Groups[0];