将记录插入数据库的更快捷方式

时间:2018-02-09 18:13:28

标签: c# sql sql-server parallel.foreach

所以我目前有一个大约70,000个名字的数据库表。我想要做的是从该数据库中获取3000个随机记录,并将它们插入到另一个表中,其中每个名称都有一行用于所有其他名称。换句话说,新表应如下所示:

John, jerry
john, alex
john, sam
jerry, alex
jerry, sam
alex, sam

这意味着我应该向表中添加n行的总和。我目前的策略是使用两个嵌套的for循环一次添加一行,然后从要添加的名称列表中删除第一个名称,以确保我没有不同顺序的重复记录。

我的问题是:有没有更快的方法来实现这一点,可能是通过并行for循环或PLINQ或其他一些我没有提到的选项?

3 个答案:

答案 0 :(得分:1)

你需要找出随机部分

select t1.name, t2.name 
from table t1 
join table t2 
on t1.name < t2.name 
order by t1.name, t2.name

您需要实现newid

declare @t table (name varchar(10) primary key);
insert into @t (name) values 
       ('Adam')
     , ('Bob')
     , ('Charlie')
     , ('Den')
     , ('Eric')
     , ('Fred');
declare @top table (name varchar(10) primary key);
insert into @top (name)
select top (4) name from @t order by NEWID();

select * from @top;

select a.name, b.name
from @top a  
join @top b 
  on a.name < b.name  
order by a.name, b.name;

答案 1 :(得分:1)

给出一张表&#34;姓名&#34;使用nvarchar(50)列&#34;名称&#34;有了这些数据:

Adam
Bob
Charlie
Den
Eric
Fred

此查询:

-- Work out the fraction we need
DECLARE @frac AS float;
SELECT @frac = CAST(35000 AS float) / 70000;

-- Get roughly that sample size
WITH ts AS (
SELECT Name FROM Names
WHERE @frac >= CAST(CHECKSUM(NEWID(), Name) & 0x7FFFFFFF AS float) / CAST (0X7FFFFFFF AS int)
)

-- Match each entry in the sample with all the other entries
SELECT x.Name + ', ' + y.Name
FROM ts AS X
CROSS JOIN
Names AS Y
WHERE x.Name <> y.Name

生成表格

的结果
Adam, Bob
Adam, Charlie
Adam, Den
Adam, Eric
Adam, Fred
Charlie, Adam
Charlie, Bob
Charlie, Den
Charlie, Eric
Charlie, Fred
Den, Adam
Den, Bob
Den, Charlie
Den, Eric
Den, Fred

结果会因跑步而异; 70000中的3000个样本将具有大约 3000 * 70000结果行。我使用了35000./70000,因为我使用的样本量只有6个。

如果您只想使用所用示例中的名称,请将CROSS JOIN Names AS Y更改为CROSS JOIN ts AS Y,然后会有大约3000 * 3000个结果行。

参考:随机抽样方法取自&#34;重要&#34;在Limiting Result Sets by Using TABLESAMPLE

答案 2 :(得分:0)

使用数字表来模拟名称。

单一查询,使用三角形连接

WITH all_names 
     AS (SELECT n, 
                'NAME_' + Cast(n AS VARCHAR(20)) NAME 
         FROM   number 
         WHERE  n < 70000), 
     rand_names 
     AS (SELECT TOP 3000 * 
         FROM   all_names 
         ORDER  BY Newid()), 
     ordered_names 
     AS (SELECT Row_number() 
                  OVER ( 
                    ORDER BY NAME) rw_num, 
                NAME 
         FROM   rand_names) 
SELECT n1.NAME, 
       n2.NAME 
FROM   ordered_names n1 
       INNER JOIN ordered_names n2 
               ON n2.rw_num > n1.rw_num