获取表中所有记录的样本

时间:2011-12-19 20:52:02

标签: c# .net linq entity-framework linq-to-entities

我有一张表中有几百万条记录。我想得到一个抽样,我将返回表格中的每个第n条记录。我目前的解决方案如下:

myQuery.Where((rec, index) => index % interval == 0);

然而,Linq to Entities不支持此并抛出异常。我也试过这个:

myQuery.Select((rec, index) => new { Index = index, Record = rec })
       .Where(x => x.Index % interval == 0);

但Linq to Entities也不支持。

即使它以这种方式工作,它似乎也不是很优雅。还有另一种方法吗?

5 个答案:

答案 0 :(得分:1)

此处有一个类似的Linq to SQL讨论LINQ to SQL Every Nth Row From Table,请查看此内容。

我认为SQL真的是要走的路。在EF中,您可以将SQL放在DefiningQuery

答案 1 :(得分:1)

似乎您没有定义n是表的索引的顺序。由于表中的数据是无序的,没有指定顺序,记录号是没有意义的,因此n记录是未定义的。这是否意味着您只需要一定的随机百分比记录?如果是这样的话,给你一个随机百分之一的数据样本的例子就是这样的:

Customer[] onePercentSample = db.Customers.Take(db.Customer.Count() / 100).ToArray();

实际上并不保证这是随机样本。它只是从数据库中的所有记录中返回一个未定义的样本,这些记录在多个查询中可能是也可能不是相同的集合。同样,这是由于表的性质是无序记录集。

答案 2 :(得分:0)

尝试按索引%interval排序。如果可行,您可以将第一个间隔/总记录作为样本。

答案 3 :(得分:0)

直接的SQL方式可能就是我的方式,但如果你有顺序ID,你可以利用Min和Max:

注意我用Linqpad测试了这个,它使用linq到sql - 但我认为结果在EF中是相同的。

var a = ReceivedPayments.Select(c=>c.Id);
int interval = 50;
var ids = new List<int>();
int min = a.Min();
int max = a.Max();
for (int i = min; i < max; i++)
{
    if((i % interval) == 0)
    {
        ids.Add(i);
    } 
}

var b = ReceivedPayments.Where(c=>ids.Contains(c.Id));

生成Sql

SELECT MIN([t0].[Id]) AS [value]
FROM [ReceivedPayments] AS [t0]
GO

SELECT MAX([t0].[Id]) AS [value]
FROM [ReceivedPayments] AS [t0]
GO

-- Region Parameters
DECLARE @p0 Int = 50
DECLARE @p1 Int = 100
DECLARE @p2 Int = 150
DECLARE @p3 Int = 200
DECLARE @p4 Int = 250
DECLARE @p5 Int = 300
DECLARE @p6 Int = 350
DECLARE @p7 Int = 400
DECLARE @p8 Int = 450
DECLARE @p9 Int = 500
DECLARE @p10 Int = 550
DECLARE @p11 Int = 600
DECLARE @p12 Int = 650
DECLARE @p13 Int = 700
DECLARE @p14 Int = 750
DECLARE @p15 Int = 800
DECLARE @p16 Int = 850
DECLARE @p17 Int = 900
DECLARE @p18 Int = 950
DECLARE @p19 Int = 1000
DECLARE @p20 Int = 1050
DECLARE @p21 Int = 1100
DECLARE @p22 Int = 1150
DECLARE @p23 Int = 1200
DECLARE @p24 Int = 1250
DECLARE @p25 Int = 1300
DECLARE @p26 Int = 1350
DECLARE @p27 Int = 1400
DECLARE @p28 Int = 1450
DECLARE @p29 Int = 1500
DECLARE @p30 Int = 1550
DECLARE @p31 Int = 1600
DECLARE @p32 Int = 1650
DECLARE @p33 Int = 1700
DECLARE @p34 Int = 1750
DECLARE @p35 Int = 1800
DECLARE @p36 Int = 1850
DECLARE @p37 Int = 1900
DECLARE @p38 Int = 1950
DECLARE @p39 Int = 2000
DECLARE @p40 Int = 2050
DECLARE @p41 Int = 2100
DECLARE @p42 Int = 2150
DECLARE @p43 Int = 2200
DECLARE @p44 Int = 2250
DECLARE @p45 Int = 2300
DECLARE @p46 Int = 2350
DECLARE @p47 Int = 2400
DECLARE @p48 Int = 2450
DECLARE @p49 Int = 2500
DECLARE @p50 Int = 2550
DECLARE @p51 Int = 2600
DECLARE @p52 Int = 2650
DECLARE @p53 Int = 2700
DECLARE @p54 Int = 2750
DECLARE @p55 Int = 2800
DECLARE @p56 Int = 2850
DECLARE @p57 Int = 2900
DECLARE @p58 Int = 2950
DECLARE @p59 Int = 3000
DECLARE @p60 Int = 3050
DECLARE @p61 Int = 3100
DECLARE @p62 Int = 3150
DECLARE @p63 Int = 3200
DECLARE @p64 Int = 3250
DECLARE @p65 Int = 3300
DECLARE @p66 Int = 3350
DECLARE @p67 Int = 3400
DECLARE @p68 Int = 3450
DECLARE @p69 Int = 3500
DECLARE @p70 Int = 3550
DECLARE @p71 Int = 3600
DECLARE @p72 Int = 3650
DECLARE @p73 Int = 3700
DECLARE @p74 Int = 3750
DECLARE @p75 Int = 3800
DECLARE @p76 Int = 3850
DECLARE @p77 Int = 3900
DECLARE @p78 Int = 3950
DECLARE @p79 Int = 4000
DECLARE @p80 Int = 4050
DECLARE @p81 Int = 4100
DECLARE @p82 Int = 4150
DECLARE @p83 Int = 4200
DECLARE @p84 Int = 4250
DECLARE @p85 Int = 4300
DECLARE @p86 Int = 4350
DECLARE @p87 Int = 4400
DECLARE @p88 Int = 4450
DECLARE @p89 Int = 4500
DECLARE @p90 Int = 4550
DECLARE @p91 Int = 4600
DECLARE @p92 Int = 4650
DECLARE @p93 Int = 4700
DECLARE @p94 Int = 4750
DECLARE @p95 Int = 4800
DECLARE @p96 Int = 4850
DECLARE @p97 Int = 4900
DECLARE @p98 Int = 4950
DECLARE @p99 Int = 5000
DECLARE @p100 Int = 5050
DECLARE @p101 Int = 5100
DECLARE @p102 Int = 5150
DECLARE @p103 Int = 5200
DECLARE @p104 Int = 5250
DECLARE @p105 Int = 5300
DECLARE @p106 Int = 5350
DECLARE @p107 Int = 5400
DECLARE @p108 Int = 5450
DECLARE @p109 Int = 5500
DECLARE @p110 Int = 5550
DECLARE @p111 Int = 5600
DECLARE @p112 Int = 5650
DECLARE @p113 Int = 5700
DECLARE @p114 Int = 5750
DECLARE @p115 Int = 5800
DECLARE @p116 Int = 5850
DECLARE @p117 Int = 5900
DECLARE @p118 Int = 5950
DECLARE @p119 Int = 6000
DECLARE @p120 Int = 6050
DECLARE @p121 Int = 6100
DECLARE @p122 Int = 6150
DECLARE @p123 Int = 6200
-- EndRegion
SELECT [t0].[Id], [t0].[TxnID], [t0].[TxnDate], [t0].[TotalAmount], [t0].[Memo], [t0].[AppliedToTxnTxnID], [t0].[AppliedToTxnTxnType], [t0].[AppliedToTxnAmount], [t0].[FQPrimaryKey], [t0].[RefNumber], [t0].[ARAccount_Id], [t0].[Customer_Id]
FROM [ReceivedPayments] AS [t0]
WHERE [t0].[Id] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100, @p101, @p102, @p103, @p104, @p105, @p106, @p107, @p108, @p109, @p110, @p111, @p112, @p113, @p114, @p115, @p116, @p117, @p118, @p119, @p120, @p121, @p122, @p123)

答案 4 :(得分:-1)

你可以随时使用Skip and Take。尽管如此,打赌它的效率非常低。

int n = 10;
int currentIndex = 0;
yourEntity current;
List<yourEntity> lstSampling = new List<yourEntity>();

while((current = context.yourEntities.Skip(currentIndex).Take(1).FirstOrDefault()) != null)
{
    listSampling.Add(current);
    currentIndex += n;
}