Question

我使用以下查询

var queryList1Only = (from file in list1
                                  select file).Except(list2, myFileCompare);

而myFileCompare根据名称和长度对2个文件进行比较。

如果list1和list2很小（在我测试时说100个文件），查询返回结果，然后我将list1增加到30,000个文件，list2增加到20,000个文件，查询现在显示为"Function Evaluation Timed Out"。< / p>

我在线搜索并发现调试可能会导致它，所以我删除了所有断点并运行了代码，现在程序刚刚冻结，没有queryList1Only的任何输出我试图打印出来检查它。 / p>

编辑：这是myFileCompare的代码

class FileCompare : System.Collections.Generic.IEqualityComparer<System.IO.FileInfo>
    {

        public FileCompare() { }

        public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
        {
            return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name && 
                    f1.Length == f2.Length);
        }

        // Return a hash that reflects the comparison criteria. According to the 
        // rules for IEqualityComparer<T>, if Equals is true, then the hash codes must
        // also be equal. Because equality as defined here is a simple value equality, not
        // reference identity, it is possible that two or more objects will produce the same
        // hash code.
        public int GetHashCode(System.IO.FileInfo fi)
        {
            string s = String.Format("{0}{1}", fi.Name, fi.Length);
            return s.GetHashCode();
        }

    }

Answer 1

您需要对查询返回的项目执行什么操作？基本上这种繁重的操作很适合在一个单独的线程中同时执行，以避免你刚遇到的情况。

编辑：一个想法

作为案例，您可以尝试以下算法：

使用QuickSort（List<T>.Sort() uses it by default）对两个数组中的项目进行排序，实现GetHashCode()
然后在众所周知的for()循环遍历列表中，比较具有相同索引的元素
当任何数组的计数达到另一个列表的最大索引时 - 选择后一个列表中的所有项目为不同的（基本上它们在以前的列表中根本不存在）。

我相信通过排序数组，您可以提供更好的性能。我认为除（）的复杂性为 O（m * n）。

编辑：另一个想法，应该非常快

从一个服务器存储Set<T>
然后循环遍历第二个数组并在Set<T>内搜索，这将非常快！基本上 O（mlogm）+ O（n）因为你需要遍历单个数组并在具有良好散列函数的集合中搜索（使用GetHashCode()我提供了更新的逻辑）非常快试试吧！

// some kind of C# pseudocode ;)
public IEnumerable<FileInfo> GetDifference()
{           
    ISet<FileInfo> firstServerFilesMap = new HashSet<FileInfo>();

    // adding items to set
    firstServerFilesMap.Add();

    List<FileInfo> secondServerFiles = new List<FileInfo>();

    // adding items to list
    firstServerFilesMap.Add();

    foreach (var secondServerFile in secondServerFiles)
    {
        if (!firstServerFilesMap.Contains(secondServerFile))
        {
            yield return secondServerFile;
        }
    }
}

编辑：评论中提供了有关相等逻辑的更多详细信息

尝试这种推动

public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
{
      if ( f1 == null || f2 == null)
      {
          return false;
      }

      return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name && 
             f1.Length == f2.Length);
}

public int GetHashCode(System.IO.FileInfo fi)
{
    unchecked
    {
        int hash = 17;    
        hash = hash * 23 + fi.Name.GetHashCode();
        hash = hash * 23 + fi.Directory.Name.GetHashCode();
        hash = hash * 23 + fi.Length.GetHashCode();

        return hash;
    }
}

有用的链接：

Answer 2

我自己没试过，但这是一个想法：以这种方式实现list1作为HashSet：

HashSet<FileInfo> List1 = new HashSet<FileInfo>(myFileCompare);

添加所有文件：

foreach(var file in files)
{
    List1.Add(file);
}

然后删除元素：

List1.ExceptWith(list2);

然后枚举：

foreach(var file in List1)
{
    //do something
}

我认为它更快，但正如我所说，我没有尝试过。这是一个link，其中包含有关HashSet的一般信息。

修改或者更好的是，您可以一步初始化和添加数据：

HashSet<FileInfo> List1 = new HashSet<FileInfo>(files, myFileCompare);

Answer 3

我建议从哈希码中删除长度，然后只执行fi.FullName。这仍然保留了唯一性指南，尽管可能（在某些情况下，您认为需要区分长度）是哈希冲突。但这可能比较长的“除外”执行更可取。同样，将您的相等性比较从名称和目录更改为fullname，这可能也会更高效。

功能超时用于大型列表（C＃中的LINQ查询）

3 个答案: