Question

我对查询的执行时间有疑问，令我感到困惑。我知道有几种方法可以解决问题并获得更好和可接受的执行时间，但仍然不知道为什么会出现问题。

样本表

我们有两个表，与外键相关。

表1

| Id | IdTable2 |
|:--:|:--------:|
|  1 |     4    |
|  2 |     7    |
|  3 |     8    |
|  4 |     6    |
|  5 |     4    |
|  6 |     1    |
|  7 |     1    |
|  8 |     6    |
|  9 |     7    |
| 10 |     1    |

表2

| Id | ValueField |
|:--:|:----------:|
|  1 |      0     |
|  2 |      0     |
|  3 |      0     |
|  4 |      1     |
|  5 |      0     |
|  6 |      1     |
|  7 |      0     |

查询

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = ?);

?可以是0或1

实际数据计数

上表只是一个简化示例，但这些表的实际行数如下：

表1： 60420 行
表2： 62 行
Table2与ValueField 0： 51 行
Table2与ValueField 1： 11 行
带有ValueField 0的IdTable2的表1： 599 行
包含带有ValueField 1的IdTable2的表1： 59821 行

问题

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 1);
-- Execution time HIGH

嗯，首先我认为子查询是斗争，但如果子查询是问题，不同的值将不会在如此绝望的时间执行，所以我想可能检索到的数据量是问题，所以我试试这个：

SELECT * FROM Table1 WHERE IdTable2 IN (1,2,3,5,7); -- Equivalent of ValueField 0
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (4,6); -- Equivalent of ValueField 1
-- Execution time LOW/INSTANT

嗯......检索到的数据也不是，让我们试试别的：

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 NOT IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT

如果我扭转它会发生什么？

SELECT * FROM Table1 WHERE IdTable2 NOT IN (SELECT Id FROM Table2 WHERE ValueField = 1);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT

嗯......这几乎告诉我问题不在于子查询和数据上，而是为什么与ValueField = 1进行比较并且使用IN是导致问题，没有其他选择可以复制HIGH执行时间？

执行计划

对于SQL IN ValueField 1：

SELECT * FROM Incidencias WHERE EstadoWorkflow in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 1);

http://s000.tinyupload.com/index.php?file_id=19036217708532467879

对于SQL IN ValueField 0：

SELECT * FROM Incidencias WHERE EstadoWorkflow in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 0);

http://s000.tinyupload.com/index.php?file_id=49593927895920014301

对于SQL NOT IN ValueField 0：

SELECT * FROM Incidencias WHERE EstadoWorkflow not in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 0);

http://s000.tinyupload.com/index.php?file_id=03901091628843565847

对于SQL NOT IN ValueField 1：

SELECT * FROM Incidencias WHERE EstadoWorkflow not in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 1);

http://s000.tinyupload.com/index.php?file_id=69996775965382534356

查询与我在示例中发布的内容相同，但是使用其他名称，这是示例查询与实际查询的等效字典。

表1 ：Incidencias
表2 ：EstadosWorkflows
IdTable2 ：EstadoWorkflow
Table2.Id ：IdEstadoWorkflow
ValueField ：最终

相反，为了更好的阅读：

Incidencias ：表1
EstadosWorkflows ：表2
EstadoWorkflow ：IdTable2
IdEstadoWorkflow ：Table2.Id
最终：ValueField

实际生产查询

此查询与查询计划显示相同的问题，但具有额外的昂贵操作（如巨大的存在和连接），问题变得更糟。我真的希望我没有用简化的例子误导你。

使用值IN

查询0

SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 0) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间： 266ms 。执行计划：http://s000.tinyupload.com/index.php?file_id=36115325682943356233

使用值IN

查询1

SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 1) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间： 28506ms 。执行计划：http://s000.tinyupload.com/index.php?file_id=72827687005228029776

使用值NOT IN

查询0

SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow not in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 0) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间： 498ms 。执行计划：http://s000.tinyupload.com/index.php?file_id=35554889075362686964

使用值NOT IN

查询1

SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow not in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 1) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间： 386ms 。执行计划：http://s000.tinyupload.com/index.php?file_id=11500314236594795220

Answer 1

导致问题的原因是SQL Server无法知道在优化时为in -statement返回的确切值，因此无法使用统计信息。

当你在in子句中有确切的值时，可以将它们与统计信息进行比较，SQL Server很可能非常准确地估计将有多少行，然后可以选择最佳的执行计划。

我自己没有尝试过，但您可以尝试为id创建过滤统计信息，分别为值字段0和1创建，这可能会改善这种情况。

<强>更新

从最新的图片中可以清楚地看到估计偏离，行数估计为1，但在嵌套循环后实际上是59851：

这个错误的估计似乎会导致大量的表扫描，因为预计只会进行一次：

由于这是表扫描而不是聚簇索引扫描，因此看起来该表没有聚簇索引，也没有其他可以使用的索引。你能为此做些什么吗？不知道数据量，但包含或正常列borrado的{{1}}索引可能有所帮助。这也是在0值计划中发生的情况，但由于行数仅为605，因此605表扫描不会花费那么多时间，但是当你这样做几乎多100倍时，它开始需要时间。 / p>

查看not in-plan，然后搜索的结构完全不同，很可能是因为估计的行数更接近实际的行，SQL Server使用这种计划：

所以另一个解决方案可能是用Usuarios_Perfiles创建一个临时表（带有perfiles -limitation）可以提供帮助，因为它只有1179行。

没有统计IO输出，它不是100％确定花费时间的地方，但看起来很像是由于表扫描造成的。

在查询

样本表

查询

实际数据计数

问题

执行计划

实际生产查询

1 个答案: