我在表中有400万行。(大小约为300 GB),我想从sql server数据库中读取表中的所有行。我在C#中使用了以下代码。这需要时间。请给我一些改进。
List<int> hit_block_index = new List<int>();
/* Here i process some other and populate hit_block_index with integers */
string _query = "SELECT TraceName,BlockVector FROM Trace";
SqlConnection _connection = new SqlConnection(connection_string);
_connection.Open();
SqlCommand _command = new SqlCommand(_query, _connection);
SqlDataReader data_reader = _command.ExecuteReader();
Byte[] block_vector=null;
string trace_name = null;
BitArray trace = null;
int max_coverage = 0;
while (data_reader.Read())
{
int coverage = 0;
block_vector = (byte[])data_reader["BlockVector"];
trace_name = (string)data_reader["TraceName"];
BitArray trace = new BitArray(block_vector);
foreach (int x in hit_blocks_index)
{
if (trace[x])
{
coverage++;
}
}
Console.WriteLine("hit count is:" + coverage);
if (coverage > max_coverage)
{
most_covered_trace = trace_name;
most_covered_array = trace;
max_coverage = coverage;
}
}
答案 0 :(得分:1)
这样的事可能有用。我对效率还不确定 - 这可能取决于你所寻找的点击量:
create type HitBlocks as table (
HitIndex int not null
)
go
create procedure FindMaxCover
@Hits HitBlocks readonly
as
;With DecomposedBlocks as (
select (HitIndex/8)+1 as ByteIndex,POWER(2,(HitIndex%8)) as BitMask
from @Hits
), Coverage as (
select
t.TraceName,SUM(CASE WHEN SUBSTRING(t.BlockVector,db.ByteIndex,1) & BitMask != 0 THEN 1 ELSE 0 END) as Coverage
from
Trace t
cross join
DecomposedBlocks db
group by
t.TraceName
), Ranked as (
select *,RANK() OVER (ORDER BY Coverage desc) as rk
from Coverage
)
select
t.TraceName,
t.BlockVector,
r.Coverage
from
Ranked r
inner join
Trace t
on
r.TraceName = t.TraceName
where rk = 1
目前,如果有多个具有相同覆盖率的结果,则会返回多行。您可能还需要调整a)我的期望与您之间的一些一对一错误,以及b)在计算正确的BitMask
值时可能存在一些字节序问题。
在您的代码中,您会使用DataTable
中当前存储的值填充hit_block_index
,并将其作为@Hits
parameter传递。
答案 1 :(得分:1)
如果您真的必须阅读所有数据......请将您的代码通过StoredProcedure或您的引擎允许的任何内容放入数据库。完全传输数据库是没有意义的。
除此之外,你应该考虑选择其他策略。 示例1:您可以在插入时创建触发器。在插入值时,您可以重新计算覆盖率而无需读取所有数据(如果可能) 示例2:您可以使用SQL Azure Federations或Azure Worker Role来扩展您的问题。