比较两个不同长度和显示差异的阵列

时间:2009-06-21 02:01:05

标签: c# arrays

问题:
我有两个可能长度不同的数组。我需要迭代两个数组并找到相似之处,添加和删除。

在C#中实现此目的的最快和最有效的方法是什么?

编辑:的 数组是预先排序的,它们可以包含50到100个项目之间的任何位置。此外,对速度和/或内存使用没有任何限制(但是,没有人喜欢内存耗费;)


例如:

String[] Foo_Old = {"test1", "test2", "test3"};
String[] Foo_New = {"test1", "test2", "test4", "test5"};

String[] Bar_Old = {"test1", "test2", "test4"};
String[] Bar_New = {"test1", "test3"};

的差异:

(关于Foo_New数组)

[Same]    "test1"
[Same]    "test2"
[Removed] "test3"
[Added]   "test4"
[Added]   "test5"

(关于Bar_New数组)

[Same]    "test1"
[Removed] "test2"
[Removed] "test4"
[Added]   "test3"

4 个答案:

答案 0 :(得分:18)

您可以使用ExceptIntersect ...

var Foo_Old = new[] { "test1", "test2", "test3" }; 
var Foo_New = new[] { "test1", "test2", "test4", "test5" };

var diff = Foo_New.Except( Foo_Old );
var inter = Foo_New.Intersect( Foo_Old );
var rem = Foo_Old.Except(Foo_New);

foreach (var s in diff)
{
    Console.WriteLine("Added " + s);
}

foreach (var s in inter)
{
    Console.WriteLine("Same " + s);
}

foreach (var s in rem)
{
    Console.WriteLine("Removed " + s);
}

答案 1 :(得分:3)

我继续进行手动编码并在接受的答案中使用示例,手动编码的表现稍好一些。我处理我的字符串的方式略有不同。要考虑的其他因素包括:Except是否对数组进行了排序复制(因为它不能假定它已经排序),或者它是否进行某种散列或线性搜索(它实际上仅限于IEnumerable) - 对于已经排序的非常大的数组,这可能是一个问题)。你可以改变我的比较IEnumerable(这是更一般的)而不是IComparable []。

static void ArrayCompare(IComparable[] Old, IComparable[] New)
{
    int lpOld = 0;
    int lpNew = 0;
    int OldLength = Old.Length;
    int NewLength = New.Length;
    while (lpOld < OldLength || lpNew < NewLength)
    {
        int compare;

        if (lpOld >= OldLength) compare = 1;
        else if (lpNew >= NewLength) compare = -1;
        else compare = Old[lpOld].CompareTo(New[lpNew]);

        if (compare < 0)
        {
            Debug.WriteLine(string.Format("[Removed] {0}", Old[lpOld].ToString()));
            lpOld++;
        }
        else if (compare > 0)
        {
            Debug.WriteLine(string.Format("[Added] {0}", New[lpNew].ToString()));
            lpNew++;
        }
        else
        {
            Debug.WriteLine(string.Format("[Same] {0}", Old[lpOld].ToString()));
            lpOld++;
            lpNew++;
        }
    }
}

static void ArrayCompare2(IComparable[] Old, IComparable[] New) {
    var diff = New.Except( Old );
    var inter = New.Intersect( Old );
    var rem = Old.Except(New);

    foreach (var s in diff)
    {
        Debug.WriteLine("Added " + s);
    }

    foreach (var s in inter)
    {
        Debug.WriteLine("Same " + s);
    }

    foreach (var s in rem)
    {
        Debug.WriteLine("Removed " + s);
    }
}

static void Main(string[] args)
{
    String[] Foo_Old = {"test1", "test2", "test3"};
    String[] Foo_New = {"test1", "test2", "test4", "test5"};
    String[] Bar_Old = {"test1", "test2", "test4"};
    String[] Bar_New = {"test1", "test3"};

    Stopwatch w1 = new Stopwatch();
    w1.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare(Foo_Old, Foo_New);
        ArrayCompare(Bar_Old, Bar_New);
    }
    w1.Stop();

    Stopwatch w2 = new Stopwatch();
    w2.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare2(Foo_Old, Foo_New);
        ArrayCompare2(Bar_Old, Bar_New);
    }
    w2.Stop();

    Debug.WriteLine(w1.Elapsed.ToString());
    Debug.WriteLine(w2.Elapsed.ToString());
}

答案 2 :(得分:1)

由于您的数组已排序,您应该能够同时遍历数组,并在一次传递中确定每个元素是否在另一个数组中。 (类似于合并排序中的合并步骤。)您可以看到以下示例:

string[] oldVersion = { "test1", "test2", "test3" };
string[] newVersion = { "test1", "test2", "test4", "test5" };

int oldIndex = 0, newIndex = 0;

while ((oldIndex < oldVersion.Length) && (newIndex < newVersion.Length)) {
   int comparison = oldVersion[oldIndex].CompareTo(newVersion[newIndex]);

   if (comparison < 0)
      Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);
   else if (comparison > 0)
      Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);
   else {
      Console.WriteLine("[Same]\t\t" + oldVersion[oldIndex++]);
      newIndex++;
   }
}

while (oldIndex < oldVersion.Length)
   Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);

while (newIndex < newVersion.Length)
   Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);

或者,您需要遍历一个数组,并且对于此数组中的每个元素,执行另一个数组的单个传递以查找匹配项。

编辑:JP对如何使用框架执行此操作提出了很好的建议。虽然,假设数组已排序,我的方法的好处是你只需要进行一次传递即可找到所有结果。你不必三次通过。

答案 3 :(得分:1)

我写了一会儿:

用法:

foreach (var diff in Foo_Old.Diff(Foo_New)){
   Console.WriteLine ("{0} action performed on {1}",diff.DiffAction,diff.Value);
}

实施:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqExtensions {

    enum DiffAction {
       Added,
       Removed,
       Same
    }

    class DiffPair<T> {
        public T Value { get; set; }
        public DiffAction DiffAction { get; set; }
    }

    static class DiffExtension {
        public static IEnumerable<DiffPair<T>> Diff<T>
                 (
                     this IEnumerable<T> original,
                     IEnumerable<T> target 
                 ) {

            Dictionary<T, DiffAction> results = new Dictionary<T, DiffAction>();

            foreach (var item in original) {
                results[item] = DiffAction.Removed;
            }

            foreach (var item in target) {
                if (results.ContainsKey(item)) {
                    results[item] = DiffAction.Same;
                } else {
                    results[item] = DiffAction.Added;
                }
            }
            return results.Select(
                pair => new DiffPair<T> {
                    Value=pair.Key, 
                    DiffAction = pair.Value
                });
        }
    }

}