选择具有10个不同值的所有行

时间:2014-07-17 12:09:27

标签: sql-server tsql distinct

我有一个非常大的(100,000,000多行),性能不佳的表格采用以下格式

TABLE [dbo].[Emails](
    [pkid] [int] IDENTITY(1,1) NOT NULL,
    [cid] [int] NULL,
    [email] [nvarchar](255) NULL,
    [lid] [int] NULL,
    [date] [smalldatetime] NULL
)

[{1}}上有一个聚集索引,pkid + cid + email上的唯一非聚集索引,以及非唯一的非聚集索引仅lid]

对于给定的cid,可以有多个条目具有相同的cid和不同的email值。

lid

对于给定的cid, email, lid, date 123, mal@serenity.fake, 456, 2014-07-17 12:21:00 123, mal@serenity.fake, 459, 2014-07-17 12:26:00 123, mal@serenity.fake, 466, 2014-07-17 12:27:00 123, zoe@serenity.fake, 456, 2014-07-17 12:21:00 123, zoe@serenity.fake, 467, 2014-07-17 12:28:00 ,我想提取前10个唯一的电子邮件,并将其余数据连接到一个lid列表以及一个日期列表或一个最小日期字段。 我还想返回所选最后一个数据行的cid,用于分页

所以例如

pkid

最有效的方法是什么?目前实施这种方式似乎非常低效,我想改进它:

mal@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
zoe@serenity.fake, [list of lids], [list of dates], [last pkid for all rows] 
wash@serenity.fake, [list of lids], [list of dates], [last pkid for all rows] 
jane@serenity.fake, [list of lids], [list of dates], [last pkid for all rows] 
kaylee@serenity.fake, [list of lids], [list of dates], [last pkid for all rows] 
inara@serenity.fake, [list of lids], [list of dates], [last pkid for all rows] 
book@serenity.fake, [list of lids], [list of dates], [last pkid for all rows] 
simon@serenity.fake, [list of lids], [list of dates], [last pkid for all rows] 
river@serenity.fake, [list of lids], [list of dates], [last pkid for all rows] 
early@intruder.fake, [list of lids], [list of dates], [last pkid for all rows] 

正如您所看到的,如果我想要查看这些结果的第5页,它会选择所有5个页面(50封电子邮件,可能是500多行数据),然后从底部选择10个。必须有更好的方法来做到这一点!

1 个答案:

答案 0 :(得分:0)

这样的事情怎么样?我没有为此编制一个百万行测试表,但执行计划更简单。构建CSV会受到伤害,因为无论你做什么,它都会强制进行第二次表扫描。

with MyRows as
(
    SELECT TOP (@rows * @page) 
        email,
        MIN([date]) as [date],
        COUNT(lid) as [lids],
        [ids] = STUFF(
            (
                SELECT ',' + CAST(lid AS VARCHAR(8)) + ';' + CAST([date] as VARCHAR(20))
                FROM Emails WITH (NOLOCK)
                WHERE cid = @cid 
                AND email = Z.email 
                FOR XML PATH('')
            ), 1, 1, '')
        , ROW_NUMBER() over(PARTITION BY email order by MIN([date]) desc) as RowNum
        FROM Emails Z WITH (NOLOCK) 
        WHERE cid = @cid 
        GROUP BY email 
        ORDER by MIN([date]) DESC 
)

select email
    , [date]
    , lids
    , ids
from MyRows
where RowNum >= @rows * @page 
and RowNum < @rows * (@page + 1)