SQL查找过去7天内的重复记录

时间:2015-07-21 20:12:02

标签: sql sql-server duplicates

您好我试图在过去7天内找到重复的网络访问,我已经构建了一个查询,但运行时间太长。任何帮助优化此查询将非常感激。我正在使用visitorguid找到重复项。

  WITH TooClose
     AS
 (
SELECT
    a.visitid AS BeforeID,
    b.visitID AS AfterID,
    a.omniturecid as [Before om id],
    b.omniturecid as [after om id],
    a.pubid as [Before pub id],
    b.pubid as [after pub id],
    a.VisitorGuid as [Before guid],
    b.VisitorGuid as [after guid],
    a.date as [Before date],
    b.date as [after date]
FROM
    webvisits a
    INNER JOIN WebVisits b ON a.VisitorGuid = b.VisitorGuid
                    AND a.date < b.Date
                    AND DATEDIFF(DAY, a.date, b.date) < 7
 Where a.Date >= '7/1/2015')
 SELECT
*
 FROM
TooClose
  WHERE
BeforeID NOT IN (SELECT AfterID FROM TooClose)

2 个答案:

答案 0 :(得分:0)

如果我正确理解您的问题,您将尝试在过去7天内找到所有重复的网络访问。不确定什么是重复的webvisit,但这是我尝试可能对你有用的东西:

;WITH q1
AS (
    SELECT a.VisitorGuid
          ,a.date
    FROM webvisits a
    WHERE a.DATE >= DATEADD(DAY, -7, cast(getdate() as date))
    )
,q2 AS
    (SELECT q1.VisitorGuid
           ,count(*) as rcount
       FROM q1
      GROUP BY q1.VisitorGuid
     )
SELECT q2.VisitorGuid
FROM q2 
WHERE q2.rcount > 1

SQL Fiddle Demo

<强>已更新

;WITH q1
AS (
    SELECT a.VisitorGuid
          ,a.date
          ,a.omniturecid 
    FROM webvisits a
    WHERE a.DATE >= DATEADD(DAY, -7, cast(getdate() as date))
    )
,q2 AS
    (SELECT q1.VisitorGuid
           ,count(*) as rcount
       FROM q1
      GROUP BY q1.VisitorGuid
      HAVING Count(*)> 1
     ) 
SELECT q1.VisitorGuid,
       q1.omniturecid,
       q1.date
  FROM q1
INNER JOIN q2 on q1.VisitorGuid = q2.VisitorGuid

SQL Fiddle Demo2

答案 1 :(得分:0)

正如其他人所指出的那样,提供样本数据是个好主意。最好是作为sqlfiddle或create和insert语句。这是一种方法,我懒得发明结构并用数据填充它来测试,所以它可能包含一些错误:

SELECT ... FROM (
    SELECT VisitorGuid
        ,  date
        , lead(date) over (partition by visitorguid
                           order by date) as next_date
        , omniturecid
        , lead(omniturecid) over (partition by visitorguid
                                  order by date) as next_omniturecid 
        , ...
        , lead(...) ...
    FROM webvisits a
) as x
WHERE DATEDIFF(DAY, date, next_date) < 7