我有七个大表,可以随时存储100到100万行。我打电话给他们LargeTable1
,LargeTable2
,LargeTable3
,LargeTable4
...... LargeTable7
。这些表大多是静态的:没有更新也没有新的插入。它们每两周更换一次或每月更换一次,当它们被截断并在每个寄存器中插入一批新的寄存器时。
所有这些表都有三个共同字段:Headquarter
,Country
和File
。 Headquarter
和Country
是#000;'格式的数字,但在其中两个表格中,由于某些其他系统需要,它们会被解析为int
。
我有另一个更小的表Headquarters
,其中包含每个总部的信息。该表的条目非常少。实际上最多1000个。
现在,我需要创建一个存储过程,它返回出现在大表中的所有那些总部,但要么在Headquarters
表中不存在,要么已被删除(这个表在逻辑上被删除:它有一个{ {1}}字段来检查这个。)
这是我试过的查询:
DeletionDate
这个sp的表现对我们的应用程序来说还不够好。它目前需要大约50秒才能完成,每个表的总行数(以便让您了解大小):
我可以做些什么来提高性能?我试图做以下事情,没有多大区别:
我还考虑过在CREATE PROCEDURE deletedHeadquarters
AS
BEGIN
DECLARE @headquartersFiles TABLE
(
hq int,
countryFile varchar(MAX)
);
SET NOCOUNT ON
INSERT INTO @headquartersFiles
SELECT headquarter, CONCAT(country, ' (', file, ')')
FROM
(
SELECT DISTINCT CONVERT(int, headquarter) as headquarter,
CONVERT(int, country) as country,
file
FROM LargeTable1
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable2
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable3
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable4
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable5
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable6
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable7
) TC
SELECT RIGHT('000' + CAST(st.headquarter AS VARCHAR(3)), 3) as headquarter,
MAX(s.deletionDate) as deletionDate,
STUFF
(
(SELECT DISTINCT ', ' + st2.countryFile
FROM @headquartersFiles st2
WHERE st2.headquarter = st.headquarter
FOR XML PATH('')),
1,
1,
''
) countryFile
FROM @headquartersFiles as st
LEFT JOIN headquarters s ON CONVERT(int, s.headquarter) = st.headquarter
WHERE s.headquarter IS NULL
OR s.deletionDate IS NOT NULL
GROUP BY st.headquarter
END
更改后将这些丢失的总部插入永久表中,但LargeTables
表可以更频繁地更改,我不想更改其模块,以保持这些东西整洁和更新。但如果它是最好的选择,那我就去做吧。
由于
答案 0 :(得分:2)
拿这个过滤器
LEFT JOIN headquarters s ON CONVERT(int, s.headquarter) = st.headquarter
WHERE s.headquarter IS NULL
OR s.deletionDate IS NOT NULL
将其添加到union中的每个单独查询中并插入@headquartersFiles
看起来这可能会产生更多的过滤器,但它实际上会加快速度,因为在开始作为联合处理之前进行过滤。
同时拿出你所有的DISTINCT,它可能不会加快速度,但它似乎很愚蠢,因为你正在做一个UNION而不是UNION。
答案 1 :(得分:1)
我首先尝试使用每个表进行过滤。您只需考虑总部可能出现在一个表中而不是另一个表中的事实。你可以这样做:
SELECT
headquarter
FROM
(
SELECT DISTINCT
headquarter,
'table1' AS large_table
FROM
LargeTable1 LT
LEFT OUTER JOIN Headquarters HQ ON HQ.headquarter = LT.headquarter
WHERE
HQ.headquarter IS NULL OR
HQ.deletion_date IS NOT NULL
UNION ALL
SELECT DISTINCT
headquarter,
'table2' AS large_table
FROM
LargeTable2 LT
LEFT OUTER JOIN Headquarters HQ ON HQ.headquarter = LT.headquarter
WHERE
HQ.headquarter IS NULL OR
HQ.deletion_date IS NOT NULL
UNION ALL
...
) SQ
GROUP BY headquarter
HAVING COUNT(*) = 5
这样可以确保所有五张桌子都没有。
答案 2 :(得分:1)
表变量的性能很差,因为sql server不会为它们生成统计信息。而不是表变量,请尝试使用临时表,如果总部+国家+文件在临时表中是唯一的,请在临时表定义中添加唯一约束(将创建聚簇索引)。您可以在创建临时表后在其上设置索引,但由于各种原因,SQL Server可能会忽略它。
编辑:事实证明,您实际上可以在表变量上创建索引,甚至在2014 +中也是非唯一的。
其次,尽量不要在连接或where子句中使用函数 - 这样做通常会导致性能问题。
答案 3 :(得分:1)
每一步都要进行过滤。但首先,修改headquarters
表,使其具有所需的正确类型。 。 。以及索引:
alter table headquarters add headquarter_int as (cast(headquarter as int));
create index idx_headquarters_int on headquarters(headquarters_int);
SELECT DISTINCT headquarter, country, file
FROM LargeTable5 lt5
WHERE NOT EXISTS (SELECT 1
FROM headquarters s
WHERE s.headquarter_int = lt5.headquarter and s.deletiondate is not null
);
然后,您需要LargeTable5(headquarter, country, file)
上的索引。
运行时间不到5秒。如果是,则构造完整查询,确保相关子查询中的类型匹配,并且您在完整表上具有正确的索引。使用union
删除表格之间的重复项。
答案 4 :(得分:0)
真正的答案是为每个表创建单独的INSERT
语句,但需要注意的是目标表中不存在要插入的数据。