在查询

时间:2016-07-20 10:00:17

标签: sql-server sql-server-2008-r2

我对查询的执行时间有疑问,令我感到困惑。 我知道有几种方法可以解决问题并获得更好和可接受的执行时间,但仍然不知道为什么会出现问题。

样本表

我们有两个表,与外键相关。

表1

| Id | IdTable2 |
|:--:|:--------:|
|  1 |     4    |
|  2 |     7    |
|  3 |     8    |
|  4 |     6    |
|  5 |     4    |
|  6 |     1    |
|  7 |     1    |
|  8 |     6    |
|  9 |     7    |
| 10 |     1    |

表2

| Id | ValueField |
|:--:|:----------:|
|  1 |      0     |
|  2 |      0     |
|  3 |      0     |
|  4 |      1     |
|  5 |      0     |
|  6 |      1     |
|  7 |      0     |

查询

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = ?);

?可以是01

实际数据计数

上表只是一个简化示例,但这些表的实际行数如下:

  • 表1: 60420
  • 表2: 62

  • Table2与ValueField 0 51

  • Table2与ValueField 1 11

  • 带有ValueField 0的IdTable2的表1: 599

  • 包含带有ValueField 1的IdTable2的表1: 59821

问题

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 1);
-- Execution time HIGH

嗯,首先我认为子查询是斗争,但如果子查询是问题,不同的值将不会在如此绝望的时间执行,所以我想可能检索到的数据量是问题,所以我试试这个:

SELECT * FROM Table1 WHERE IdTable2 IN (1,2,3,5,7); -- Equivalent of ValueField 0
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (4,6); -- Equivalent of ValueField 1
-- Execution time LOW/INSTANT

嗯......检索到的数据也不是,让我们试试别的:

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 NOT IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT

如果我扭转它会发生什么?

SELECT * FROM Table1 WHERE IdTable2 NOT IN (SELECT Id FROM Table2 WHERE ValueField = 1);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT
嗯......这几乎告诉我问题不在于子查询和数据上,而是为什么与ValueField = 1进行比较并且使用IN是导致问题,没有其他选择可以复制HIGH执行时间?

执行计划

对于SQL IN ValueField 1

SELECT * FROM Incidencias WHERE EstadoWorkflow in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 1);

http://s000.tinyupload.com/index.php?file_id=19036217708532467879

对于SQL IN ValueField 0

SELECT * FROM Incidencias WHERE EstadoWorkflow in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 0);

http://s000.tinyupload.com/index.php?file_id=49593927895920014301

对于SQL NOT IN ValueField 0

SELECT * FROM Incidencias WHERE EstadoWorkflow not in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 0);

http://s000.tinyupload.com/index.php?file_id=03901091628843565847

对于SQL NOT IN ValueField 1

SELECT * FROM Incidencias WHERE EstadoWorkflow not in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 1);

http://s000.tinyupload.com/index.php?file_id=69996775965382534356

查询与我在示例中发布的内容相同,但是使用其他名称,这是示例查询与实际查询的等效字典。

  • 表1 :Incidencias
  • 表2 :EstadosWorkflows
  • IdTable2 :EstadoWorkflow
  • Table2.Id :IdEstadoWorkflow
  • ValueField :最终

相反,为了更好的阅读:

  • Incidencias :表1
  • EstadosWorkflows :表2
  • EstadoWorkflow :IdTable2
  • IdEstadoWorkflow :Table2.Id
  • 最终:ValueField

实际生产查询

此查询与查询计划显示相同的问题,但具有额外的昂贵操作(如巨大的存在和连接),问题变得更糟。 我真的希望我没有用简化的例子误导你。

使用值IN

查询0
SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 0) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间: 266ms 。 执行计划:http://s000.tinyupload.com/index.php?file_id=36115325682943356233

使用值IN

查询1
SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 1) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间: 28506ms 。 执行计划:http://s000.tinyupload.com/index.php?file_id=72827687005228029776

使用值NOT IN

查询0
SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow not in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 0) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间: 498ms 。 执行计划:http://s000.tinyupload.com/index.php?file_id=35554889075362686964

使用值NOT IN

查询1
SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow not in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 1) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间: 386ms 。 执行计划:http://s000.tinyupload.com/index.php?file_id=11500314236594795220

1 个答案:

答案 0 :(得分:2)

导致问题的原因是SQL Server无法知道在优化时为in -statement返回的确切值,因此无法使用统计信息。

当你在in子句中有确切的值时,可以将它们与统计信息进行比较,SQL Server很可能非常准确地估计将有多少行,然后可以选择最佳的执行计划。

我自己没有尝试过,但您可以尝试为id创建过滤统计信息,分别为值字段0和1创建,这可能会改善这种情况。

<强>更新

从最新的图片中可以清楚地看到估计偏离,行数估计为1,但在嵌套循环后实际上是59851:

enter image description here

这个错误的估计似乎会导致大量的表扫描,因为预计只会进行一次:

enter image description here

由于这是表扫描而不是聚簇索引扫描,因此看起来该表没有聚簇索引,也没有其他可以使用的索引。你能为此做些什么吗?不知道数据量,但包含或正常列borrado的{​​{1}}索引可能有所帮助。这也是在0值计划中发生的情况,但由于行数仅为605,因此605表扫描不会花费那么多时间,但是当你这样做几乎多100倍时,它开始需要时间。 / p>

查看not in-plan,然后搜索的结构完全不同,很可能是因为估计的行数更接近实际的行,SQL Server使用这种计划:

enter image description here

所以另一个解决方案可能是用Usuarios_Perfiles创建一个临时表(带有perfiles -limitation)可以提供帮助,因为它只有1179行。

没有统计IO输出,它不是100%确定花费时间的地方,但看起来很像是由于表扫描造成的。