复杂的Linq查询

时间:2013-10-01 14:04:50

标签: c# .net linq

我在数据库中有一个包含2个字段的表:index(int),email(varchar(100))

我需要做以下事情:

  1. 按域名对所有电子邮件进行分组(所有电子邮件都已小写)。
  2. 选择所有组中的电子邮件,其中域名电子邮件总数不超过第1步之前电子邮件总数的20%。
  3. 代码示例:

        DataContext db = new DataContext();
    
        //Domains to group by
        List<string> domains = new List<string>() { "gmail.com", "yahoo.com", "hotmail.com" };
    
        Dictionary<string, List<string>> emailGroups = new Dictionary<string, List<string>>();
    
        //Init dictionary
        foreach (string thisDomain in domains)
        {
            emailGroups.Add(thisDomain, new List<string>());
        }
    
        //Get distinct emails
        var emails = db.Clients.Select(x => x.Email).Distinct();
    
        //Total emails
        int totalEmails = emails.Count();
    
        //One percent of total emails
        int onePercent = totalEmails / 100;
    
        //Run on each email
        foreach (var thisEmail in emails)
        {
            //Run on each domain
            foreach (string thisDomain in emailGroups.Keys)
            {
                //If email from this domain
                if (thisEmail.Contains(thisDomain))
                {
                    //Add to dictionary
                    emailGroups[thisDomain].Add(thisEmail);
                }
            }
        }
    
        //Will store the final result
        List<string> finalEmails = new List<string>();
    
        //Run on each domain
        foreach (string thisDomain in emailGroups.Keys)
        {
            //Get percent of emails in group
            int thisDomainPercents = emailGroups[thisDomain].Count / onePercent;
    
            //More than 20%
            if (thisDomainPercents > 20)
            {
                //Take only 20% and join to the final result
                finalEmails = finalEmails.Union(emailGroups[thisDomain].Take(20 * onePercent)).ToList();
            }
            else
            {
                //Join all to the final result
                finalEmails = finalEmails.Union(emailGroups[thisDomain]).ToList();
            }
        }
    

    有谁知道更好的方法吗?

3 个答案:

答案 0 :(得分:2)

我无法想到这样做的方法,如果没有至少两次击中数据库,一次用于分组,一次用于整体计数,你可以尝试类似

var query = from u in db.Users
            group u by u.Email.Split('@')[1] into g
            select new 
            {
                Domain = g.Key,
                Users = g.ToList()
            };

query = query.Where(x => x.Users.Count <= (db.Users.Count() * 0.2));

答案 1 :(得分:1)

假设您希望在每个组中按升序排列最后一项:

int m = (int) (input.Count() * 0.2);
var result = input.GroupBy(x=>x.email.Split('@')[1],
                          (key,g)=>g.OrderByDescending(x=>x.index).Take(m)
                                    .OrderBy(x=>x.index))
                  .SelectMany(g=>g);//If you want to get the last result without grouping

或者这个:

var result = input.GroupBy(x=>x.email.Split('@')[1],
                          (key,g)=>g.OrderBy(x=>x.index)
                                    .Skip(g.Count()-m))
                  .SelectMany(g=>g);//If you want to get the last result without grouping

答案 2 :(得分:0)

var maxCount = db.Users.Count() * 0.2;
var query = (from u in db.Users
        group u by u.Email.Split('@')[1] into g
        select new 
        {
            Domain = g.Key,
            Users = g.Take(maxCount).ToList()
        })
        .SelectMany(x => x.Users);