C#如何比较两个单词串并指出哪些部分不同

时间:2013-03-08 19:08:36

标签: c# string string-comparison

例如,如果我有......

string a = "personil";
string b = "personal";

我想得到......

string c = "person[i]l";

然而,它不一定是单个字符。我也可以这样......

string a = "disfuncshunal";
string b = "dysfunctional";

对于这种情况,我想得到......

string c = "d[isfuncshu]nal";

另一个例子是......(注意两个单词的长度不同。)

string a = "parralele";
string b = "parallel";

string c = "par[ralele]";

另一个例子是......

string a = "ato";
string b = "auto";

string c = "a[]to";

我将如何做到这一点?

编辑:两个字符串的长度可以不同。

编辑:添加了其他示例。感谢用户Nenad询问。

5 个答案:

答案 0 :(得分:4)

我今天一定非常无聊,但我实际上让UnitTest通过了所有4个案例(如果你在此期间没有添加更多的话)。

修改:添加了2个边缘案例并修复了它们。

Edit2 :多次重复的字母(以及这些字母的错误)

[Test]
[TestCase("parralele", "parallel", "par[ralele]")]
[TestCase("personil", "personal", "person[i]l")]
[TestCase("disfuncshunal", "dysfunctional", "d[isfuncshu]nal")]
[TestCase("ato", "auto", "a[]to")]
[TestCase("inactioned", "inaction", "inaction[ed]")]
[TestCase("refraction", "fraction", "[re]fraction")]
[TestCase("adiction", "ad[]diction", "ad[]iction")]
public void CompareStringsTest(string attempted, string correct, string expectedResult)
{
    int first = -1, last = -1;

    string result = null;
    int shorterLength = (attempted.Length < correct.Length ? attempted.Length : correct.Length);

    // First - [
    for (int i = 0; i < shorterLength; i++)
    {
        if (correct[i] != attempted[i])
        {
            first = i;
            break;
        }
    }

    // Last - ]
    var a = correct.Reverse().ToArray();
    var b = attempted.Reverse().ToArray();
    for (int i = 0; i < shorterLength; i++)
    {
        if (a[i] != b[i])
        {
            last = i;
            break;
        }
    }

    if (first == -1 && last == -1)
        result = attempted;
    else
    {
        var sb = new StringBuilder();
        if (first == -1)
            first = shorterLength;
        if (last == -1)
            last = shorterLength;
        // If same letter repeats multiple times (ex: addition)
        // and error is on that letter, we have to trim trail.
        if (first + last > shorterLength)
            last = shorterLength - first;

        if (first > 0)
            sb.Append(attempted.Substring(0, first));

        sb.Append("[");

        if (last > -1 && last + first < attempted.Length)
            sb.Append(attempted.Substring(first, attempted.Length - last - first));

        sb.Append("]");

        if (last > 0)
            sb.Append(attempted.Substring(attempted.Length - last, last));

        result = sb.ToString();
    }
    Assert.AreEqual(expectedResult, result);
}

答案 1 :(得分:1)

您是否尝试过DiffLib

使用该库和以下代码(在LINQPad中运行):

void Main()
{
    string a = "disfuncshunal";
    string b = "dysfunctional";

    var diff = new Diff<char>(a, b);

    var result = new StringBuilder();
    int index1 = 0;
    int index2 = 0;
    foreach (var part in diff)
    {
        if (part.Equal)
            result.Append(a.Substring(index1, part.Length1));
        else
            result.Append("[" + a.Substring(index1, part.Length1) + "]");
        index1 += part.Length1;
        index2 += part.Length2;
    }
    result.ToString().Dump();
}

你得到这个输出:

d[i]sfunc[shu]nal

说实话,我不明白这会给你带来什么,因为你似乎完全忽略了b字符串中的变化部分,只是转储了a字符串的相关部分。

答案 2 :(得分:0)

这是一个完整且有效的控制台应用程序,适用于您提供的两个示例:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            string a = "disfuncshunal";
            string b = "dysfunctional";

            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < a.Length; i++)
            {
                if (a[i] != b[i])
                {
                    sb.Append("[");
                    sb.Append(a[i]);
                    sb.Append("]");

                    continue;
                }

                sb.Append(a[i]);
            }

            var str = sb.ToString();
            var startIndex = str.IndexOf("[");
            var endIndex = str.LastIndexOf("]");

            var start = str.Substring(0, startIndex + 1);
            var mid = str.Substring(startIndex + 1, endIndex - 1);
            var end = str.Substring(endIndex);

            Console.WriteLine(start + mid.Replace("[", "").Replace("]", "") + end);
        }
    }
}

无效 如果您想要显示不匹配字的多个整个部分。

答案 3 :(得分:0)

如果字符串的长度不同,则没有指定要执行的操作,但是当字符串长度相等时,这是解决问题的方法:

private string Compare(string string1, string string2) {
            //This only works if the two strings are the same length..
            string output = "";
            bool mismatch = false;
            for (int i = 0; i < string1.Length; i++) {
                char c1 = string1[i];
                char c2 = string2[i];
                if (c1 == c2) {
                    if (mismatch) {
                        output += "]" + c1;
                        mismatch = false;
                    } else {
                        output += c1;
                    }
                } else {
                    if (mismatch) {
                        output += c1;
                    } else {
                        output += "[" + c1;
                        mismatch = true;
                    }
                }
            }
            return output;
        }

答案 4 :(得分:0)

不是很好的方法,但作为使用LINQ的练习:任务似乎找到2个字符串的匹配前缀和后缀,返回“前缀+ [+第一个字符串的中间+后缀。

所以你可以匹配前缀(Zip + TakeWhile(a == b)),而不是通过反转两个字符串和反转结果来重复相同的后缀。

var first = "disfuncshunal";
var second = "dysfunctional";

// Prefix
var zipped = first.ToCharArray().Zip(second.ToCharArray(), (f,s)=> new {f,s});
var prefix = string.Join("", 
    zipped.TakeWhile(c => c.f==c.s).Select(c => c.f));

// Suffix
var zippedReverse = first.ToCharArray().Reverse()
   .Zip(second.ToCharArray().Reverse(), (f,s)=> new {f,s});
var suffix = string.Join("", 
    zippedReverse.TakeWhile(c => c.f==c.s).Reverse().Select(c => c.f));

// Cut and combine.
var middle = first.Substring(prefix.Length,
      first.Length - prefix.Length - suffix.Length);
var result = prefix + "[" + middle + "]" + suffix;

更简单快捷的方法是使用2个for循环(从开始到结束,从开始到结束)。