Question

我想创建一个查询，以检查每个日期是否使用了重复的ID。

为此，应检查基准日期之前60天而不是之后的任何重复。表格示例如下。

CREATE TABLE SampleTable (
   pKey INT PRIMARY KEY,
   personalID INT NOT NULL,
   createDate DATETIME NOT NULL,
   value INT NULL
);

基准日期对应于DATE(createDate)，而要检查是否重复的ID为personallD。本文所需的信息可以归结为今天的数据和重复的数据。可以很容易地查询今天的数据数量，如下所示。

SELECT
   COUNT(*)
FROM SampleTable
WHERE
   DATE(createDate) = DATE(NOW())

除了今天的数据外，可以按以下方式检查重复数据的数量。

SELECT
   COUNT(*)
FROM (
   SELECT
      personalID,
      COUNT(*)
   FROM SampleTable
   WHERE
      DATEDIFF(NOW(), trDate) <= 60
   GROUP BY personalID HAVING COUNT(*) > 1
) AS T

总而言之，我要做的就是按日期获取总数的数据，以及前几天具有相同personalID的数据的数量。

[样本数据]

pKey    personalID  createDate  value
1       1           2018-01-01  100
2       2           2018-01-01  300
3       3           2018-01-01  500
7       1           2018-01-02  100
8       2           2018-01-02  200
9       3           2018-01-02  200
10      4           2018-01-02  100
11      5           2018-01-02  100
12      3           2018-01-03  200
13      4           2018-01-03  100
14      5           2018-01-03  100
15      6           2018-01-03  50

[所需结果]

date        totalCount  duplicated
2018-01-01  3           0
2018-01-02  5           3
2018-01-03  4           3

Answer 1

如果您想要过去60天内具有多行的ID：

select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;

如果您还要坚持personid出现在最近的日期：

select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();

编辑：

这似乎是您想要的，假设在特定的一天没有重复

select trdate, count(*) as num_persons,
       sum(num_dups > 0) as num_dups
from (select sd.*,
             (select count(*)
              from sampledata sd2
              where sd2.personid = sd.personid and
                    sd2.trdate < sd.trdate and
                    sd2.trdate >= sd.trdate - interval 60 day
             ) as num_dups
      from sampledata
     ) sd
group by trdate;

Answer 2

您也可以使用自联接方法来查找此类数据。如果需要通过与以前的日期进行比较来找出重复的ID，则也可以使用此方法。

Create table Testtbl (pkey int, personalID int, createddate date, value int); 

insert into Testtbl values 
(1  ,     1,           '2018-01-01' , 100) , 
(2 ,     2,           '2018-01-01' , 300) ,
(3  ,     3,           '2018-01-01' , 500) ,
(4  ,     1,           '2018-01-02' , 100) ,
(5  ,     2,           '2018-01-02' , 200) ,
(6  ,     3,           '2018-01-02' , 200) ,
(7  ,     4,           '2018-01-02' , 100) ,
(8  ,     5,           '2018-01-02' , 100) ,
(9  ,     3,           '2018-01-03' , 200) ,
(14  ,     3,           '2018-01-03' , 500) ,
(10  ,     4,           '2018-01-03' , 100) ,
(11  ,     5,           '2018-01-03' , 100) ,
(12  ,     6,           '2018-01-03' , 50),
(13  ,     6,           '2018-01-03' , 100)

查询：左联接将有助于找出重复的数据而不会丢失总数。唯一的区别是确保相同的ID不会被计算两次。

  select  t.createddate, count(Distinct t.pkey) TotalCount, 
  case when t.Createddate > t1.createddate  
  then Count(distinct t1.PersonalID) + case when t.Createddate = 
  t1.createddate and 
  t.personalID = t1.personalID  and t.pkey != t1.pkey then Count(distinct 
  t1.PersonalID) 
  else 0 end else 0     
  end   Duplicated   from Testtbl t 
  left join Testtbl t1 on t.personalID = t1.personalID 
                    and t.Createddate >= t1.Createddate and t.pkey != t1.pkey 
                    and DATEDIFF(t1.Createddate, t.Createddate) <= 60 
 Group by t.createddate

输出：

createddate  TotalCount Duplicated
2018-01-01     3         0
2018-01-02     5         3
2018-01-03     6         5

基准日期重复的数据计数

2 个答案: