遍历记录组

时间:2018-09-21 07:14:04

标签: sql sql-server tsql

SQL Server 2014,我有一个表,其中包含许多行,例如15、5的groupid列为736881,而10的组id列为3084235。我要做的是依次处理每组记录并将结果加载到表中。

我已经编写了执行此操作的代码,但是我认为我没有设置错误的loopcounter,因为我不断获取两次记录736881的组ID。

由于包含个人信息,我目前无法发布测试数据,但是如果错误不明显,我将尝试创建一些虚拟数据。

SELECT @LoopCounter = min(rowfilter) , @maxrowfilter = max(rowfilter) 
FROM peops6

WHILE ( @LoopCounter IS NOT NULL
        AND  @LoopCounter <= @maxrowfilter)

begin

declare @customer_dist as Table (
    [id] [int] NOT NULL,
    [First_Name] [varchar](50) NULL,
    [Last_Name] [varchar](50) NULL,
    [DoB] [date] NULL,
    [post_code] [varchar](50) NULL,
    [mobile] [varchar](50) NULL,
    [Email] [varchar](100) NULL );


INSERT INTO @customer_dist (id, First_Name, Last_Name, DoB, post_code, mobile, Email)
select id, first_name, last_name, dob, postcode, mobile_phone, email  from peops6 where rowfilter = @LoopCounter

insert into results
SELECT result.* ,
       [dbo].GetPercentageOfTwoStringMatching(result.DoB, d.DoB) [DOB%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.post_code, d.post_code) [post_code%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.mobile, d.mobile) [mobile%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.Email, d.Email) [email%match]
 FROM   (   SELECT (   SELECT MIN(id)
                      FROM   @customer_dist AS sq
                      WHERE  sq.First_Name = cd.First_Name
                             AND sq.Last_Name = cd.Last_Name
                             AND (   sq.DoB = cd.DoB  
                                     OR sq.mobile = cd.mobile
                                     OR sq.Email = cd.Email
                                     OR sq.post_code = cd.post_code )) nid ,
                  *
           FROM   @customer_dist AS cd ) AS result
       INNER JOIN @customer_dist d ON result.nid = d.id order by 1, 2 asc;

SELECT @LoopCounter  = min(rowfilter) FROM peops6
   WHERE rowfilter > @LoopCounter

end 

2 个答案:

答案 0 :(得分:0)

您需要在循环结束时截断表变量(@customer_dist):

....
-- Add this
TRUNCATE TABLE @customer_dist

SELECT @LoopCounter  = min(rowfilter) FROM peops6
   WHERE rowfilter > @LoopCounter

end

请参阅:https://social.msdn.microsoft.com/Forums/sqlserver/en-US/42ef20dc-7ad8-44f7-b676-a4596fc0d593/declaring-a-table-variable-inside-a-loop-does-not-delete-the-previous-data?forum=transactsql

答案 1 :(得分:0)

我不确定您需要像使用SQL Cursor这样的LOOP来完成此任务

请检查以下我使用多个CTE表达式的SQL语句

with customer_dist as (
    select
        rowfilter,
        id, first_name, last_name, dob, postcode, mobile_phone, email
    from peops6 
), result as (
    SELECT
        (
        SELECT
            MIN(id)
        FROM customer_dist AS sq
        WHERE 
            sq.rowfilter  = cd.rowfilter
        AND sq.First_Name = cd.First_Name
        AND sq.Last_Name  = cd.Last_Name
        AND (sq.DoB = cd.DoB OR sq.mobile_phone = cd.mobile_phone OR sq.Email = cd.Email OR sq.postcode = cd.postcode )
        ) nid,
        *
    FROM customer_dist AS cd
)
SELECT 
    result.* ,
    [dbo].edit_distance(result.DoB, d.DoB) [DOB%match] ,
    [dbo].edit_distance(result.postcode, d.postcode) [post_code%match] ,
    [dbo].edit_distance(result.mobile_phone, d.mobile_phone) [mobile%match] ,
    [dbo].edit_distance(result.Email, d.Email) [email%match]
FROM result
INNER JOIN customer_dist d 
    ON result.nid = d.id 
order by 1, 2 asc;

请注意,我在此示例中使用了模糊字符串匹配Levenshtein Distance Algorithm而不是您的函数

结果如下 enter image description here

只需要在最后一个SELECT语句之前添加INSERT语句

希望它有用