Question

考虑样本表

Col 1, Col2, Col3
1    , x   , G
1    , y   , H
2    , z   , J
2    , a   , K
2    , a   , K
3    , b   , E

我想要下面的结果，即不同的行

1    , x   , G
1    , y   , H
2    , z   , J
2    , a   , K
3    , b   , E

我试过

var Result = Context.Table.Select(C => 
                 new { 
                       Col1 = C.Col1,
                       Col2 = C.Col2,
                       Col3 = C.Col3 
                      }).Distinct();

和

Context.Table.GroupBy(x=>new {x.Col1,x.Col2,x.Col3}).Select(x=>x.First()).ToList();

结果如预期，但我的表有35列和100万条记录，其大小将继续增长，查询的当前时间是22-30秒，所以如何提高性能并将其降低到2-3秒？

Answer 1

使用distinct是要走的路......我会说你尝试的第一种方法是正确的 - 但你真的需要所有100万行吗？查看您可以添加的where条件，或者只获取前x个记录？

var Result = Context.Table.Select(c => new 
    { 
        Col1 = c.Col1,
        Col2 = c.Col2,
        Col3 = c.Col3 
    })
    .Where(c => /*some condition to narrow results*/)
    .Take(1000) //some number of the wanted amount of records
    .Distinct();

您可以做的是使用rownum来批量选择。类似的东西：

public <return type> RetrieveBulk(int fromRow, int toRow)
{
    return Context.Table.Where(record => record.Rownum >= fromRow && record.Rownum < toRow)
        .Select(c => new 
        { 
            Col1 = c.Col1,
            Col2 = c.Col2,
            Col3 = c.Col3 
        }).Distinct();
}

此代码可以执行以下操作：

List<Task<return type>> selectTasks = new List<Task<return type>>();
for(int i = 0; i < 1000000; i+=1000)
{
    selectTasks.Add(Task.Run(() => RetrieveBulk(i, i + 1000)));
}

Task.WaitAll(selectTasks);

//And then intercet data using some efficient structure as a HashSet so when you intersect it wont be o(n)2 but o(n)

具有不同

1 个答案: