如何在C#</string>中的List <string>中查找字符串的公共部分

时间:2012-11-14 17:37:50

标签: c# linq

我正在寻找一种简洁的方法来查找字符串列表中字符串的最大公共部分。我想从一个看起来像

的列表中找到一种方法
{"1 Some Street, Some Town, XYZ" ,
"2 Some Street, Some Town, ABC" ,
"3 Some Street, Some Town, XYZ" ,
"4 Some Street, Some Town, ABC" }

返回单个字符串"Some Street, Some Town, "。我不知道字符串的那个共同部分是在输入列表中的字符串的开头,结尾还是中间,我认为应该有一个简洁的方法来做到这一点,但我想不到它的。

1 个答案:

答案 0 :(得分:2)

改编自gloomy.penguin的评论,使用http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring

static void Main(string[] args)
{
    var values = new List<string>
    {"1 Some Street, Some Town, XYZ" ,
    "2 Some Street, Some Town, ABC" ,
    "3 Some Street, Some Town, XYZ" ,
    "4 Some Street, Some Town, ABC" };

    Console.WriteLine(LongestCommonSubstring(values));

    Console.ReadLine();
}

public static string LongestCommonSubstring(IList<string> values)
{
    string result = string.Empty;

    for (int i = 0; i < values.Count - 1; i++)
    {
        for (int j = i + 1; j < values.Count; j++)
        {
            string tmp;
            if (LongestCommonSubstring(values[i], values[j], out tmp) > result.Length)
            {
                result = tmp;
            }
        }
    }

    return result;
}

// Source: http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring
public static int LongestCommonSubstring(string str1, string str2, out string sequence)
{
    sequence = string.Empty;
    if (String.IsNullOrEmpty(str1) || String.IsNullOrEmpty(str2))
        return 0;

    int[,] num = new int[str1.Length, str2.Length];
    int maxlen = 0;
    int lastSubsBegin = 0;
    StringBuilder sequenceBuilder = new StringBuilder();

    for (int i = 0; i < str1.Length; i++)
    {
        for (int j = 0; j < str2.Length; j++)
        {
            if (str1[i] != str2[j])
                num[i, j] = 0;
            else
            {
                if ((i == 0) || (j == 0))
                    num[i, j] = 1;
                else
                    num[i, j] = 1 + num[i - 1, j - 1];

                if (num[i, j] > maxlen)
                {
                    maxlen = num[i, j];
                    int thisSubsBegin = i - num[i, j] + 1;
                    if (lastSubsBegin == thisSubsBegin)
                    {//if the current LCS is the same as the last time this block ran
                        sequenceBuilder.Append(str1[i]);
                    }
                    else //this block resets the string builder if a different LCS is found
                    {
                        lastSubsBegin = thisSubsBegin;
                        sequenceBuilder.Length = 0; //clear it
                        sequenceBuilder.Append(str1.Substring(lastSubsBegin, (i + 1) - lastSubsBegin));
                    }
                }
            }
        }
    }
    sequence = sequenceBuilder.ToString();
    return maxlen;
}

注意:如果出现平局,则假定先到先得。