提高sql查询的性能

时间:2015-08-25 14:27:53

标签: c# sql-server

我有一个查询,我用C#代码编写来检索数据库的结果。 根据用户输入,我的查询将发生变化 我有一个用户输入说'挑战'。根据用户在UI中选择的内容,“挑战”可以有三个值0,1和2 如果用户选择挑战0,则查询将不同 如果用户选择挑战1,则查询将不同 如果用户选择挑战2,则查询将是挑战0的查询和挑战1的查询的并集。

以下是我目前生成sql查询的代码:

         string sql = @"SELECT
            tbl.given_id, name, message, tbl.emplid, category, created_date
        FROM
            dbo.myTable tbl

        WHERE
           created_date >= @dtFrom
            AND created_date < @dtTo
        ";


    //Include only challenge 0  
         if(challenge == 0)
                        {
                            sql += " AND emplid IN (" + sb.ToString() + ")";  //sb.ToString() is list of emplids selected from UI.
                        }

    //Include only challenge 1    
         if (challenge == 1)
                        {
                            sql += " AND given_Id IN (SELECT DISTINCT rec.given_Id  from dbo.Recipient rec WHERE rec.given_Id = tbl.given_Id AND rec.emplid IN (" + sb.ToString() + ")) ";
                        }  


//Include both challenge 0 and 1

    if (challenge == 2)
                    {
                        sql = String.Format(@"
    SELECT * FROM
    (
        (
            SELECT
                tbl.given_id, name, message, tbl.emplid, category, created_date
            FROM
                dbo.myTable tbl             
            WHERE
                created_date >= @dtFrom
                AND created_date < @dtTo
                AND tbl.emplid IN ({0})
        )
        UNION
        (
           SELECT
                tbl.given_id, name, message, tbl.emplid, category, created_date
            FROM
                dbo.myTable tbl             
            WHERE
                created_date >= @dtFrom
                AND created_date < @dtTo
                AND given_Id IN (SELECT DISTINCT rec.given_Id  from dbo.Recipient rec WHERE rec.given_Id = tbl.given_Id AND rec.emplid IN (" + sb.ToString() + "))

        )

    )
    ", sb.ToString());

                    }   

因此,如果(challenge == 2),则通过在challenge = 0和challenge = 1中生成的查询的并集生成查询。
challenge = 2中的查询代码只是重复上述两个查询 有没有其他方式编写此类查询,以便我可以改进查询并提高性能?

感谢。

编辑 - 只编写生成的查询以简化我的问题:

查询1 -

SELECT
                tbl.given_id, name, message, tbl.emplid, category, created_date
            FROM
                dbo.myTable tbl             
            WHERE
                created_date >= @dtFrom
                AND created_date < @dtTo
                AND tbl.emplid IN (@lst_emplids)  

查询2

SELECT
                    tbl.given_id, name, message, tbl.emplid, category, created_date
                FROM
                    dbo.myTable tbl             
                WHERE
                    created_date >= @dtFrom
                    AND created_date < @dtTo
                    AND given_Id IN (SELECT DISTINCT rec.given_Id  from dbo.Recipient rec WHERE rec.given_Id = tbl.given_Id AND rec.emplid IN (@lst_emplids)  

查询3

SELECT * FROM
        (
            (
                SELECT
                    tbl.given_id, name, message, tbl.emplid, category, created_date
                FROM
                    dbo.myTable tbl             
                WHERE
                    created_date >= @dtFrom
                    AND created_date < @dtTo
                    AND tbl.emplid IN (@lst_emplids)
            )
            UNION
            (
               SELECT
                    tbl.given_id, name, message, tbl.emplid, category, created_date
                FROM
                    dbo.myTable tbl             
                WHERE
                    created_date >= @dtFrom
                    AND created_date < @dtTo
                    AND given_Id IN (SELECT DISTINCT rec.given_Id  from dbo.Recipient rec WHERE rec.given_Id = tbl.given_Id AND rec.emplid IN (@lst_emplids)  

            )

        )

5 个答案:

答案 0 :(得分:2)

第二次和第三次查询中的嵌套IN业务看起来不太好。它可能是个人偏好而不是其他任何东西,但我已经看到由于嵌套的IN运算符,优化器无法创建最佳计划的情况。特别是如果你的DISTINCT阻碍了 - 我不确定这是否会被尽职尽责地忽略或实际执行,当你真正需要的只是一个EXISTS时(参见本答案的底部)。如果没有看到生成的计划,我就不能说太多了。

实际上,编写子查询的方式是多余的 - 您同时使用WHERE given_Id IN和关联WHERE rec.given_Id = tbl.given_Id。这可能是真正的问题。我们只想要其中一个。

我该怎么办?首先,我将相关的子查询内容作为连接拉出来:

select tbl.given_id, tbl.name, tbl.message, tbl.emplid, tbl.category, tbl.created_date
from dbo.myTable tbl
join dbo.Recipient r on tbl.given_Id = r.given_Id
where created_date >= @dtFrom
and created_date < @dtTo
and r.emplid IN ( ... )

其次,在“两个”情况下,我会偏离UNION而转而使用OR。重复消除是不必要且昂贵的:

select tbl.given_id, tbl.name, tbl.message, tbl.emplid, tbl.category, tbl.created_date
from dbo.myTable tbl
join dbo.Recipient r on tbl.given_Id = r.given_Id
where created_date >= @dtFrom
and created_date < @dtTo
and (r.emplid IN ( ... ) or tbl.emplid IN ( ... ))

注意我假设你的sb.ToString()是安全的,这可能是一个坏主意。在我的书中,动态SQL是最终的最后手段。您可以将其写入存储过程;而不是IN,你会传递一个表值参数然后加入它。

我也假设你的架构有些事情;你可能确实想要在最后一个查询中使用一个DISTINCT。或者坦率地说,您可以将其更改为EXISTS

select tbl.given_id, tbl.name, tbl.message, tbl.emplid, tbl.category, tbl.created_date
from dbo.myTable tbl
where created_date >= @dtFrom
and created_date < @dtTo
and (tbl.emplid IN ( ... ) or exists (
    select 1
    from dbo.Recipient r
    where r.given_Id = tbl.given_Id
    and r.emplid IN ( ... )))

答案 1 :(得分:2)

你可以做很多事情。首先,让SQL做繁重的工作是值得的。您可以将参数传递给它,并让它弄清楚如何根据参数排列查询。

为了充分利用,您需要将SQL包装在预定义的过程或函数中。在这种情况下,我想我会使用一个函数 - 但要么是好的。我的例子是一个功能。

在任何一种情况下,您都可以让SQL将您的employeeID列表视为表格。要获得该支持,您需要在SQL中创建一个表类型:

create type dbo.IntIdsType as table ( Id int )

表类型只是结构的声明 - 它们本身不是表格。您随后可以声明该类型的变量。

一旦你有一个表类型,你的函数将获取所有参数并找出要做的事情:

编辑2 :我之前提供了一些粗略的条件逻辑 - 这已经在下面的函数中进行了修改:

create function dbo.GetMyTable
( 
    @from datetime, 
    @to datetime, 
    @challenge int, 
    @employeeIds dbo.IntIdsType readonly 
)
returns table as return
    select
        tbl.given_id, name, message, tbl.emplid, category, created_date
    from
        dbo.myTable tbl             
    where
        created_date >= @dtFrom 
        and
        created_date < @dtTo 
        and
        (
            (
                @challenge = 0 
                and 
                tbl.emplid in ( select Id from @employeeIds )
            )
            or
            (
                @challenge = 1
                and
                given_Id in 
                (
                    select 
                        rec.given_Id  
                    from 
                        dbo.Recipient rec 
                    where 
                        rec.given_Id = tbl.given_Id 
                        and 
                        rec.emplid in ( select Id from @employeeIds )
                )
            )
            or
            (
                @challenge = 2
                and
                (
                    (
                        @challenge = 0 
                        and 
                        tbl.emplid in ( select Id from @employeeIds )
                    )
                    or
                    (
                        given_Id in 
                        (
                            select 
                                rec.given_Id  
                            from 
                                dbo.Recipient rec 
                            where 
                                rec.given_Id = tbl.given_Id 
                                and 
                                rec.emplid in ( select Id from @employeeIds )
                        )
                    )
                )
            )
        )

注意缺少联合 - 以及这如何让SQL决定根据参数的值使用谓词。另外 - 请注意如何使用传递的表值参数,就像它是一个表一样。

设置通话有点吵 - 可能是这样的:

async Task GetMyData( DateTime fromDate, DateTime toDate, int challenge, params int[ ] emloyeeIds )
{

  using ( var connection = new SqlConnection( "my connection string" ) )
  {
    connection.Open( );  // forgot this earlier
    using ( var command = connection.CreateCommand( ) )
    {
      //--> set up command basics...
      command.CommandText = @"
        select 
          given_id, name, message, emplid, category, created_date 
        from 
          GetMyTable( @from, @to, @challenge, @employeeIds )";
      command.CommandType = System.Data.CommandType.Text;

      //--> easy parameters...
      command.Parameters.AddWithValue( "@from", fromDate );
      command.Parameters.AddWithValue( "@to", toDate );
      command.Parameters.AddWithValue( "@challenge", challenge );

      //--> table-valued parameter...
      var table = new DataTable( "EmployeeIds" );
      table.Columns.Add( "Id", typeof( int ) );
      foreach ( var Id in emloyeeIds ) table.Rows.Add( Id );
      var employeeIdsParameter = new SqlParameter( "@employeeIds", SqlDbType.Structured );
      employeeIdsParameter.TypeName = "dbo.IntIdsType";
      employeeIdsParameter.Value = table;
      command.Parameters.Add( employeeIdsParameter );

      //--> do the work...
      using ( var reader = await command.ExecuteReaderAsync( ) )
      {
        while ( await reader.ReadAsync( ) )
        {
          //...
        }
      }
    }
  }
}

所有这一切的优势 - 除了希望获得的性能提升之外 - 您还可以避免使用奇怪的逻辑来构建SQL字符串以及该方法带来的SQL注入攻击面。

编辑:

根据评论:在c#代码中,它没有将字符串返回给调用者 - 它显示了命令对象的设置,如何向其添加参数,以及如何执行它。目前可能正在调用您的字符串构建代码的过程中完成。对于像这样的事情,这是一个相当典型的设置 - 但是有很多变化。例如,调用代码可以将命令对象交给您,然后填写它。采用了许多不同的模式。

答案 2 :(得分:0)

为什么不按照以下方式编写单个查询:

SELECT <stuff in first query>
WHERE ... 
AND @Challenge in (1,3)
UNION
SELECT <stuff in second query>
WHERE ...
AND @Challenge in (2,3)

让查询优化器为您完成艰苦的工作。

答案 3 :(得分:-1)

我不喜欢连接的SQL查询。希望对每个条件或一个操作此条件的查询进行单独查询。取决于难度。

由于只有两个独立的条件,另一个组合它 - 我们可以使用布尔逻辑。

SELECT
    tbl.given_id, 
    name, 
    [message], 
    tbl.emplid, 
    category, 
    created_date
FROM
    dbo.myTable tbl             
WHERE 
    created_date >= @dtFrom
    AND created_date < @dtTo
    AND ((@challenge <> 2 AND tbl.emplid IN (@lst_emplids)) 
      OR (@challenge <> 1 AND given_Id IN (SELECT DISTINCT rec.given_Id  from dbo.Recipient rec WHERE rec.given_Id = tbl.given_Id AND rec.emplid IN (@lst_emplids)))

如果@challenge = 1或3将被过滤所有项目tbl.emplid IN(@lst_emplids)
如果@challenge = 2或3将被过滤所有项目给定_Id IN(...)
因此,如果@challenge是3 - 将提取所有项目,但如果1或2 - 只有一个子集。

答案 4 :(得分:-2)

更明智的解决方案是从存储过程中调用此代码并使用动态sql来构建查询,并使用参数化查询而不是连接UI中的值,这会使您的代码容易出现sql注入。

我会这样做......

CREATE PROCEDURE my_SP
   @dtFrom      DATE
 , @dtTo        DATE
 , @Sb          VARCHAR(8000)
 , @challenge   INT
 AS
 BEGIN
   SET NOCOUNT ON;
DECLARE @Sql NVARCHAR(MAX);

SET @Sql = N'  declare @xml xml = N''<root><r>'' + replace(@Sb,'','',''</r><r>'') + ''</r></root>'';
SELECT tbl.given_id, name, message, tbl.emplid, category, created_date
FROM  dbo.myTable tbl
WHERE created_date >= @dtFrom
 AND  created_date < @dtTo'

+ CASE WHEN (@challenge = 0)
         THEN N'AND emplid IN ( select r.value(''.'',''varchar(max)'') as item
                                from @xml.nodes(''//root/r'') as records(r))' 
       WHEN  (@challenge = 1)
         THEN N'AND given_Id IN (SELECT rec.given_Id  
                                from dbo.Recipient rec 
                                WHERE rec.given_Id = tbl.given_Id 
                                AND rec.emplid IN ( select r.value(''.'',''varchar(max)'') as item
                                                    from @xml.nodes(''//root/r'') as records(r))
                                )'
        WHEN (@challenge = 2)
          THEN 'AND 
                    ( emplid IN ( select r.value(''.'',''varchar(max)'') as item
                                  from @xml.nodes(''//root/r'') as records(r))
                      OR 
                      given_Id IN (SELECT rec.given_Id  
                                from dbo.Recipient rec 
                                WHERE rec.given_Id = tbl.given_Id 
                                AND rec.emplid IN ( select r.value(''.'',''varchar(max)'') as item
                                                    from @xml.nodes(''//root/r'') as records(r))
                                )
                     )'
         ELSE N'' END

Exec sp_executesql @Sql 
                  ,N'@dtFrom DATE, @dtTo DATE, @Sb VARCHAR(8000)'
                  ,@dtFrom 
                  ,@dtTo 
                  ,@Sb 
END