在处理文件之前检查重复项

时间:2012-06-26 10:20:45

标签: sql-server stored-procedures ssis

我正在使用SSIS迭代文件夹并将所有txt文件的内容读入数据库。 首先,我正在检查文件是否已使用存储过程

进行处理
ALTER PROC [dbo].[CheckForDuplicateFileEntry]
(
  @TaskID INT,
  @Filename VARCHAR(50),
  @FileAlreadyExists BIT OUTPUT
)AS 
BEGIN

    DECLARE @TaskTypeID INT
    SET @FileAlreadyExists = 0

    SELECT @TaskTypeID = TaskTypeID FROM Tasks WHERE TaskID = @TaskID

    IF EXISTS ( SELECT  TaskID
                FROM    TaskSteps
                WHERE   @Filename IN (
                        SELECT TOP ( 30 )
                                TaskSteps.Filename
                        FROM    TaskSteps
                                INNER JOIN Tasks ON TaskSteps.TaskID = Tasks.TaskID
                        WHERE   ( Tasks.TaskTypeID = @TaskTypeID
                                  AND [Filename] IS NOT NULL
                                )
                                AND IsValid = 1
                                AND ProcessStatusID = 2 ) ) 
        BEGIN              
            INSERT  INTO TaskSteps ( TaskID, StepDesc )
            VALUES  (
                      @TaskID,
                      'Duplicate filename. (' + @Filename + ') Already exists.' 
                    )                
            SET @FileAlreadyExists = 1                  
        END        
END

我也试过

IF EXISTS (SELECT TOP 30 Filename
       FROM  TaskSteps  INNER JOIN Tasks ON TaskSteps.TaskID = Tasks.TaskID
       WHERE (SUBSTRING(TaskSteps.Filename,18,13) = SUBSTRING(@Filename,18,13))
             AND IsValid = 1                            
             AND ProcessStatusID = 2)

但是当它遍历文件时它处理第一个声明,下一个是重复处理第三个并声明第四个重复,依此类推 文件名非常相似,即Songs_120501_175535.txt,文件名保持不变,日期和时间部分是唯一改变的部分,可能只改变一个数字,即Songs_120502_175535.txt

2 个答案:

答案 0 :(得分:1)

这很简单,我可以得到它。我会检查SSIS进程中的其他内容或触发器等是否正在删除或更新TaskSteps / Task表,这有时会导致它工作,有时则不会。另请检查IsValidProcessStatusID列,因为这些列在查询中使用。尝试在Management Studio中使用相同的参数运行查询三次,四次,五次 - 并在每次之后打印出@FileAlreadyExists值。如果只是调用proc工作正常,则问题出在SSIS的其他地方。

ALTER PROC [dbo].[CheckForDuplicateFileEntry]
(
  @TaskID INT,
  @Filename VARCHAR(50),
  @FileAlreadyExists BIT OUTPUT
)AS 
BEGIN

SET @FileAlreadyExists = 0

IF EXISTS( 
           SELECT     * 
           FROM       TaskSteps
           INNER JOIN Tasks 
           ON         TaskSteps.TaskID = Tasks.TaskID
           WHERE      Tasks.TaskTypeID = TaskSteps.TaskTypeID
           AND        Tasks.TaskID = TaskSteps.TaskID
           AND        IsValid = 1
           AND        ProcessStatusID = 2
           AND        TaskSteps.Filename = @FileName
         )
    BEGIN              
        INSERT  INTO TaskSteps ( TaskID, StepDesc )
        VALUES  (
                  @TaskID,
                  'Duplicate filename. (' + @Filename + ') Already exists.' 
                )                
        SET @FileAlreadyExists = 1                  
    END        
END

答案 1 :(得分:1)

试试这个

使用字符索引功能

IF EXISTS (SELECT TOP 30 Filename
           FROM  TaskSteps  INNER JOIN Tasks ON TaskSteps.TaskID = Tasks.TaskID
           WHERE (SUBSTRING(TaskSteps.Filename,CHARINDEX('_',TaskSteps.Filename),13) =    
                  SUBSTRING(@Filename,CHARINDEX('_',@Filename),13) ) 
                 AND IsValid = 1                            
                 AND ProcessStatusID = 2)