我有一个非常大的(100,000,000多行),性能不佳的表格采用以下格式
TABLE [dbo].[Emails](
[pkid] [int] IDENTITY(1,1) NOT NULL,
[cid] [int] NULL,
[email] [nvarchar](255) NULL,
[lid] [int] NULL,
[date] [smalldatetime] NULL
)
[{1}}上有一个聚集索引,pkid
+ cid
+ email
上的唯一非聚集索引,以及非唯一的非聚集索引仅lid
]
对于给定的cid
,可以有多个条目具有相同的cid
和不同的email
值。
lid
对于给定的cid, email, lid, date
123, mal@serenity.fake, 456, 2014-07-17 12:21:00
123, mal@serenity.fake, 459, 2014-07-17 12:26:00
123, mal@serenity.fake, 466, 2014-07-17 12:27:00
123, zoe@serenity.fake, 456, 2014-07-17 12:21:00
123, zoe@serenity.fake, 467, 2014-07-17 12:28:00
,我想提取前10个唯一的电子邮件,并将其余数据连接到一个lid列表以及一个日期列表或一个最小日期字段。
我还想返回所选最后一个数据行的cid
,用于分页
所以例如
pkid
最有效的方法是什么?目前实施这种方式似乎非常低效,我想改进它:
mal@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
zoe@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
wash@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
jane@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
kaylee@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
inara@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
book@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
simon@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
river@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
early@intruder.fake, [list of lids], [list of dates], [last pkid for all rows]
正如您所看到的,如果我想要查看这些结果的第5页,它会选择所有5个页面(50封电子邮件,可能是500多行数据),然后从底部选择10个。必须有更好的方法来做到这一点!
答案 0 :(得分:0)
这样的事情怎么样?我没有为此编制一个百万行测试表,但执行计划更简单。构建CSV会受到伤害,因为无论你做什么,它都会强制进行第二次表扫描。
with MyRows as
(
SELECT TOP (@rows * @page)
email,
MIN([date]) as [date],
COUNT(lid) as [lids],
[ids] = STUFF(
(
SELECT ',' + CAST(lid AS VARCHAR(8)) + ';' + CAST([date] as VARCHAR(20))
FROM Emails WITH (NOLOCK)
WHERE cid = @cid
AND email = Z.email
FOR XML PATH('')
), 1, 1, '')
, ROW_NUMBER() over(PARTITION BY email order by MIN([date]) desc) as RowNum
FROM Emails Z WITH (NOLOCK)
WHERE cid = @cid
GROUP BY email
ORDER by MIN([date]) DESC
)
select email
, [date]
, lids
, ids
from MyRows
where RowNum >= @rows * @page
and RowNum < @rows * (@page + 1)