找到两个列表中的差异

时间:2012-04-03 15:08:30

标签: c# algorithm list optimization comparison

我正在考虑找到两个列表中差异的好方法

问题在于:

两个列表包含一些字符串,其中前3个数字/字符(*分隔)表示唯一键(后跟文本String =" key1 * key2 * key3 * text")。

这是字符串示例:

AA1*1D*4*The quick brown fox*****CC*3456321234543~

其中" * AA1 * 1D * 4 *"是一个独特的关键

List1:" index1 * index2 * index3"," index2 * index2 * index3"," index3 * index2 * index3"

List2:" index2 * index2 * index3"," index1 * index2 * index3"," index3 * index2 * index3"," index4 *索引2 * INDEX3"

我需要匹配两个列表中的索引并进行比较。

  1. 如果1个列表中的所有3个索引都匹配另一个列表中的3个索引,我需要跟踪新列表中的两个字符串条目

  2. 如果一个列表中有一组索引不会出现在另一个列表中,我需要跟踪一侧并在另一侧保留一个空条目。 (上例中的#4)

  3. 返回列表

    这是我到目前为止所做的,但我在这里有点挣扎:

            List<String> Base = baseListCopy.Except(resultListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values(keep differences in lists)
            List<String> Result = resultListCopy.Except(baseListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values (keep differences in lists)
    
            List<String[]> blocksComparison = new List<String[]>(); //we container for non-matching blocks; so we could output them later
    
            //if both reports have same amount of blocks
            if ((Result.Count > 0 || Base.Count > 0) && (Result.Count == Base.Count))
            {
                foreach (String S in Result)
                {
                    String[] sArr = S.Split('*');
                    foreach (String B in Base)
                    {
                        String[] bArr = B.Split('*');
    
                        if (sArr[0].Equals(bArr[0]) && sArr[1].Equals(bArr[1]) && sArr[2].Equals(bArr[2]) && sArr[3].Equals(bArr[3]))
                        {
                            String[] NA = new String[2]; //keep results
                            NA[0] = B; //[0] for base
                            NA[1] = S; //[1] for result
                            blocksComparison.Add(NA);
                            break;
                        }
                    }
                }
            }
    

    你能为这个过程建议一个好的算法吗?

    谢谢

3 个答案:

答案 0 :(得分:3)

您可以使用HashSet。

为List1创建一个HashSet。记住index1 * index2 * index3与index3 * index2 * index1不同。

现在迭代第二个列表。

Create Hashset for List1.

foreach(string in list2)
{
    if(hashset contains string)
       //Add it to the new list.
}

答案 1 :(得分:1)

List one = new List();
List two = new List();
List three = new List();
HashMap<String,Integer> intersect = new HashMap<String,Integer>();

for(one: String index)
{
    intersect.put(index.next,intersect.get(index.next) + 1);
}

for(two: String index)
{
    if(intersect.containsKey(index.next))
    {
        three.add(index.next);
    }
}

答案 2 :(得分:1)

如果我正确理解您的问题,您希望能够通过“密钥”前缀比较元素,而不是整个字符串内容。如果是这样,实现自定义相等比较器将允许您轻松利用LINQ集算法。

这个程序......

class EqCmp : IEqualityComparer<string> {

    public bool Equals(string x, string y) {
        return GetKey(x).SequenceEqual(GetKey(y));
    }

    public int GetHashCode(string obj) {
        // Using Sum could cause OverflowException.
        return GetKey(obj).Aggregate(0, (sum, subkey) => sum + subkey.GetHashCode());
    }

    static IEnumerable<string> GetKey(string line) {
        // If we just split to 3 strings, the last one could exceed the key, so we split to 4.
        // This is not the most efficient way, but is simple.
        return line.Split(new[] { '*' }, 4).Take(3);
    }

}

class Program {

    static void Main(string[] args) {

        var l1 = new List<string> {
            "index1*index1*index1*some text",
            "index1*index1*index2*some text ** test test test",
            "index1*index2*index1*some text",
            "index1*index2*index2*some text",
            "index2*index1*index1*some text"
        };

        var l2 = new List<string> {
            "index1*index1*index2*some text ** test test test",
            "index2*index1*index1*some text",
            "index2*index1*index2*some text"
        };

        var eq = new EqCmp();

        Console.WriteLine("Elements that are both in l1 and l2:");
        foreach (var line in l1.Intersect(l2, eq))
            Console.WriteLine(line);

        Console.WriteLine("\nElements that are in l1 but not in l2:");
        foreach (var line in l1.Except(l2, eq))
            Console.WriteLine(line);

        // Etc...

    }

}

...打印以下结果:

Elements that are both in l1 and l2:
index1*index1*index2*some text ** test test test
index2*index1*index1*some text

Elements that are in l1 but not in l2:
index1*index1*index1*some text
index1*index2*index1*some text
index1*index2*index2*some text