比较大型数据集时使用嵌套循环时速度慢

时间:2012-10-15 20:47:37

标签: c# linq entity-framework

我正在使用Linq到实体检索两个数据集列表。它们都在同一个数据库中,但我需要将一个表转换为我的任务表,因为它已集成到我的日历中。这里不值得深入细节我很确定,但我希望加快匹配id和创建新Task对象的过程。这是一个Once and done片段,所以即使速度很慢,我也可以让程序一夜之间运行。但是,为了将来参考,我想提出一些提高效率的建议。

var accounts = data.Accounts.ToList().OrderBy(a => a.ID);
Incidents[] Incidents = data.Incidents.ToArray();

        for (int i=0;i<Incidents.Length;i++)
        {
            foreach (var a in accounts)
            {
                if (a.Acct_CID == Incidents[i].CustomerID)
                {
                    Tasks t = new Tasks();
                    t.creator_id = a.ID;
                    t.start_date = Incidents[i].DateOpened;
                    t.end_date = Incidents[i].DateCLosed;
                    t.product_code = Incidents[i].ProductCode;
                    t.install_type = Incidents[i].InstallType;
                    t.os = Incidents[i].OSType;
                    t.details = Incidents[i].Description;
                    t.solution = Incidents[i].Solution;
                    t.creator_name = Incidents[i].TechID;
                    t.category = Incidents[i].Title;
                    t.text = "Ticket for" + " " + Incidents[i].Name;
                    if (t.end_date == DateTime.MinValue || t.end_date == null)
                        t.status_id = 6;
                    else t.status_id = 7;
                    data.Tasks.Add(t);
                    break;
                }
            }
        }
        data.SaveChanges();

5 个答案:

答案 0 :(得分:3)

为什么不动态加入表格并创建任务?

var tasks = from i in data.Incidents
            join a in data.Accounts on i.CustomerID equals a.Acct_CID
            select new Tasks()
            {
                creator_id = a.ID,
                start_date = i.DateOpened,
                end_date = i.DateCLosed
                // ...
            };

顺便说一下,我认为排序没有意义,因此将创建的任务添加到数据库的顺序无关紧要。

// Query will not be executed until here
foreach(var task in tasks)
   data.Tasks.Add(task);
data.SaveChanges();

答案 1 :(得分:3)

我想Join DB上的结果

var joinedResult = data.Accounts.Join(data.Incidents, 
                                      a => a.Acct_CID, 
                                      i => i.CustomerID, 
                                      (a, i) => new { Account = a, Incident = i });

foreach (var item in joinedResult)
{
    Tasks t = new Tasks();
    t.creator_id = item.Account.ID;
    t.start_date = item.Incident.DateOpened;
    ........

}

答案 2 :(得分:1)

替换此行

var accounts = data.Accounts.ToList().OrderBy(a => a.ID);

用这个

var accounts = data.Accounts.OrderBy(a => a.ID).ToList();

这将让数据库进行排序,然后缓存已排序的结果。你现在拥有的东西,然后每次到达foreach循环时对它们进行排序(accounts再次枚举)。

我不能说它会有很大的改进,但是如果你的数据集足够大,那么多次重新排序大型列表肯定会让你失望。


乍一看,你不仅每次都要排序accounts,而且你似乎只是在寻找一小部分记录,而你却在整个数组上进行迭代。考虑更换

    foreach (var a in accounts)
        {
            if (a.Acct_CID == Incidents[i].CustomerID)
            {

      foreach (var a in accounts.Where(acct => acct.Acct_CID == Incidents[i].CustomerID))
      {

答案 3 :(得分:1)

创建帐户查找

var accountsLookup = data.Accounts.ToLookup(a => a.Acct_CID);
foreach (var incident in data.Incidents)
{
    foreach (var a in accountsLookup[incident.CustomerID])
    {
        Tasks t = new Tasks();
        t.creator_id = a.ID;
        ...
    }
}
data.SaveChanges();

如果帐户是唯一的,您还可以创建字典

var accountsDict = data.Accounts.ToDictionary(a => a.Acct_CID);
foreach (var incident in data.Incidents)
{
    Account a;
    if (accountsDict.TryGetValue(incident.CustomerID, out a)
    {
        Tasks t = new Tasks();
        t.creator_id = a.ID;
        ...
    }
}
data.SaveChanges();

这比第一个变体更快。请注意,字典具有不依赖于其大小的常量查找时间。因此,您基本上可以获得循环的O(n)执行时间。您的原始实现具有O(n ^ 2)执行时间。

答案 4 :(得分:0)

    var tasks = (from i in data.Incidents
                     join a in data.Accounts on i.CustomerID equals a.Acct_CID
                     select new
                     {
                         creator_id = a.ID,
                         start_date = i.DateOpened,
                         end_date = i.DateCLosed,
                         product_code = i.ProductCode,
                         install_type = i.InstallType,
                         os = i.OSType,
                         details = i.Description,
                         solution = i.Solution,
                         creator_name = i.TechID,
                         category = i.Title,
                         text = "Ticket for" + " " + i.Name,
                         status_id = 7
                     }).AsEnumerable().Select(x => new
                         {
                             x.creator_id,
                             x.start_date,
                             x.end_date,
                             x.product_code,
                             x.os,
                             x.details,
                             x.solution,
                             x.creator_name,
                             x.category,
                             x.text,
                             x.install_type,
                             x.status_id
                         });


        foreach (var item in tasks)
        {
            Tasks t = new Tasks();
            t.os = item.os;
            t.id = item.creator_id;
            t.install_type = item.install_type;
            t.start_date = item.start_date;
            t.end_date = item.end_date;
            t.solution = item.solution;
            t.details = item.details;
            t.creator_name = item.creator_name;
            t.category = item.category;
            t.text = item.text;
            t.product_code = item.product_code;
             if (t.end_date == DateTime.MinValue || t.end_date == null)
                 t.status_id = 6;
             else t.status_id = 7;
             data.Tasks.Add(t);
        }
        data.SaveChanges();