Question

我在表中有400万行。（大小约为300 GB），我想从sql server数据库中读取表中的所有行。我在C＃中使用了以下代码。这需要时间。请给我一些改进。

            List<int> hit_block_index = new List<int>();

            /* Here i process some other and populate hit_block_index with integers */

            string _query = "SELECT TraceName,BlockVector FROM Trace";

            SqlConnection _connection = new SqlConnection(connection_string);

            _connection.Open();

            SqlCommand _command = new SqlCommand(_query, _connection);

            SqlDataReader data_reader = _command.ExecuteReader();

            Byte[] block_vector=null;

            string trace_name = null;

            BitArray trace = null;

            int max_coverage = 0;

            while (data_reader.Read())
            {
                  int coverage = 0;

                  block_vector = (byte[])data_reader["BlockVector"];

                  trace_name = (string)data_reader["TraceName"];

                  BitArray trace = new BitArray(block_vector);

                  foreach (int x in hit_blocks_index)
                  {
                       if (trace[x])
                       {
                           coverage++;
                       }
                  }

                  Console.WriteLine("hit count is:" + coverage);

                  if (coverage > max_coverage)
                  {
                         most_covered_trace = trace_name;
                         most_covered_array = trace;
                         max_coverage = coverage;
                  }
           }

Answer 1

这样的事可能有用。我对效率还不确定 - 这可能取决于你所寻找的点击量：

create type HitBlocks as table (
    HitIndex int not null
)
go
create procedure FindMaxCover
    @Hits HitBlocks readonly
as
    ;With DecomposedBlocks as (
        select (HitIndex/8)+1 as ByteIndex,POWER(2,(HitIndex%8)) as BitMask
        from @Hits
    ), Coverage as (
        select
            t.TraceName,SUM(CASE WHEN SUBSTRING(t.BlockVector,db.ByteIndex,1) & BitMask != 0 THEN 1 ELSE 0 END) as Coverage
        from
            Trace t
                cross join
            DecomposedBlocks db
        group by
            t.TraceName
    ), Ranked as (
        select *,RANK() OVER (ORDER BY Coverage desc) as rk
        from Coverage
    )
    select
        t.TraceName,
        t.BlockVector,
        r.Coverage
    from
        Ranked r
            inner join
        Trace t
            on
                r.TraceName = t.TraceName
    where rk = 1

目前，如果有多个具有相同覆盖率的结果，则会返回多行。您可能还需要调整a）我的期望与您之间的一些一对一错误，以及b）在计算正确的BitMask值时可能存在一些字节序问题。

在您的代码中，您会使用DataTable中当前存储的值填充hit_block_index，并将其作为@Hits parameter传递。

Answer 2

如果您真的必须阅读所有数据......请将您的代码通过StoredProcedure或您的引擎允许的任何内容放入数据库。完全传输数据库是没有意义的。

除此之外，你应该考虑选择其他策略。示例1：您可以在插入时创建触发器。在插入值时，您可以重新计算覆盖率而无需读取所有数据（如果可能）示例2：您可以使用SQL Azure Federations或Azure Worker Role来扩展您的问题。

使用c＃花费大量时间从SQL Server读取

2 个答案: