SQL不适用于大样本

时间:2017-03-17 14:44:52

标签: sql sql-server tsql

我正在尝试解决一个挑战,并想出了解决方案。我编写的解决方案适用于小型数据集,但对于较大的数据集似乎不适用。有人可以帮助我,我在哪里做错了吗?

我在计算每一天的唯一用户时遇到了麻烦(输出中的第二列)。其余的逻辑工作正常。

  朱莉娅进行了15天的SQL比赛。比赛的开始日期是2016年3月1日,结束日期是2016年3月15日。

     

编写一个查询,以打印每天至少提交的唯一黑客的总数(从比赛的第一天开始),并找到每天提交最大数量的黑客的hacker_id和名称。如果不止一个这样的黑客拥有最大提交数量,请打印最低的hacker_id。查询应该在比赛的每一天打印此信息,并按日期排序。

     

输入格式

     

以下表格包含比赛数据:

     

黑客:hacker_id是黑客的id,名字就是名字   黑客。

enter image description here

  

提交:submission_date是提交的日期,submission_id是提交的ID,hacker_id是提交提交的黑客的ID,score是提交的分数。

enter image description here

示例输入

enter image description here

enter image description here

对于以下示例输入,假设比赛的结束日期是2016年3月6日。

黑客表:提交表:

**Explanation :-**

2016年3月1日,黑客,并提交了意见书。有一些独特的黑客每天至少提交一份提交。由于每个黑客提交了一个提交,被认为是在这一天提交了最多数量的黑客。黑客的名字是安吉拉。

2016年3月2日,黑客,并提交了意见书。现在,每天都是唯一提交的人,因此每天都有至少一次提交的黑客。提交了意见,黑客的名字是迈克尔。

2016年3月3日,黑客,并提交了意见书。现在是唯一的,所以有独特的黑客每天至少提交一次。由于每个黑客提交了一份提交,因此被认为是在这一天提交了大量提交内容的黑客。黑客的名字是安吉拉。

2016年3月4日,黑客,并提交了意见书。现在,每天只提交,所以有独特的黑客每天至少提交一次。由于每个黑客提交了一份提交,因此被认为是在这一天提交了大量提交内容的黑客。黑客的名字是安吉拉。

2016年3月5日,黑客,并提交了意见书。现在每天只提交,所以只有独特的黑客每天至少提交一次提交。提交的意见和黑客的名字是弗兰克。

2016年3月6日只提交了提交,因此只有独特的黑客每天至少提交一次提交。提交和黑客的名字是Angela。

示例输出

2016-03-01 4 20703 Angela
2016-03-02 2 79722 Michael
2016-03-03 2 20703 Angela
2016-03-04 2 20703 Angela
2016-03-05 1 36396 Frank
2016-03-06 1 20703 Angela

Schema & Data :-

http://sqlfiddle.com/#!9/844928

Solution :-


SELECT A.submission_date, A.cnt, B.hacker_id, B.name 
  FROM
    (
        SELECT submission_date, COUNT( DISTINCT hacker_id ) AS cnt
          FROM submissions
         WHERE submission_date = '2016-03-01'
         GROUP BY submission_date 
        UNION ALL
        SELECT submission_date, COUNT( DISTINCT hacker_id )
          FROM
            (
                SELECT DATEADD(day, 1, convert( date, A.submission_date ))  AS submission_date, A.hacker_id
                  FROM 
                    (
                       SELECT submission_date, hacker_id
                         FROM submissions
                       GROUP BY submission_date, hacker_id
                     ) A
                INNER  JOIN  
                    (
                         SELECT DATEADD(day, -1, convert( date, submission_date )) AS new_submission_date, hacker_id
                           FROM submissions
                          GROUP BY DATEADD(day, -1, convert( date, submission_date )) , hacker_id
                     ) B
              ON A.submission_date = B.new_submission_date
             AND A.hacker_id = B.hacker_id  
            ) Z
        GROUP BY submission_date
    ) A
INNER JOIN 
(
    SELECT s.submission_date, s.hacker_id, h.name
      FROM
    (
        SELECT submission_date, hacker_id 
          FROM
        ( 
            SELECT submission_date, hacker_id,cnt, ROW_NUMBER() OVER ( PARTITION BY submission_date ORDER BY cnt DESC, hacker_id ) AS rn
              FROM 
            (
             SELECT submission_date, hacker_id, COUNT(*) AS cnt
               FROM submissions
              GROUP BY submission_date, hacker_id
            ) Z
        ) Y
        WHERE rn = 1
    ) s
    INNER JOIN
    hackers h
    ON s.hacker_id = h.hacker_id
) B
ON A.submission_date = B.submission_date
;

6 个答案:

答案 0 :(得分:1)

select * from user where id in (select user_id from legalhold_system where system_id in(select id from system_user where system_id=:1))

答案 1 :(得分:0)

IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results;
CREATE TABLE #Results
([Number of Hackers that had a Submission]                 INT,
 SubmissionDate                                            DATE,
 [Greatest # of Submissions by Hacker (lowest ID if tied)] INT,
 [Hacker Name with Most Submissions]                       VARCHAR(50)
);
DECLARE @CurrentDate DATE;
DECLARE my CURSOR
FOR SELECT DISTINCT
           submission_date
    FROM submissions;
OPEN my;
FETCH NEXT FROM my INTO @CurrentDate;
WHILE @@FETCH_STATUS = 0
    BEGIN
        INSERT INTO #Results
               SELECT a.hackers [Number of Hackers that had a Submission],
                      a.SubmissionDate,
                      b.Submission_Count [Greatest # of Submissions by Hacker (lowest ID if tied)],
                      b.Hacker [Hacker Name with Most Submissions]
               FROM
               (
                   SELECT COUNT(DISTINCT hacker_ID) hackers,
                          @CurrentDate [SubmissionDate]
                   FROM submissions
                   WHERE submission_date = @CurrentDate
               ) a
               JOIN
               (
                   SELECT TOP 1 COUNT(submission_id) Submission_Count,
                                b.name [Hacker],
                                submission_date
                   FROM submissions a
                        JOIN hackers b ON a.hacker_id = b.hacker_id
                   WHERE a.submission_date = @currentDate
                   GROUP BY b.name,
                            a.hacker_id,
                            submission_date
                   ORDER BY COUNT(submission_id) DESC,
                            a.hacker_id
               ) b ON a.SubmissionDate = b.submission_date;
        FETCH NEXT FROM my INTO @CurrentDate;
    END;
CLOSE my;
DEALLOCATE my;
SELECT *
FROM #Results;

通常不喜欢使用游标,但它可以快速获取小数据,并且易于按日期评估。

您的结果很接近,但与我的结果不一样,没有时间诊断您的查询,因此请使用此结果进行比较和对比。

考虑到你在3月17日发布了这篇文章,我猜测并希望这是否是家庭作业,现在已经过去了......而且我没有帮助你作弊......

祝你好运!

结果:

enter image description here

答案 2 :(得分:0)

尝试以下查询:

select submission_date ,( SELECT COUNT(distinct hacker_id)  
                    FROM Submissions s2  
                    WHERE s2.submission_date = s1.submission_date AND 
                    (SELECT COUNT(distinct s3.submission_date) 
                     FROM Submissions s3 
                     WHERE s3.hacker_id = s2.hacker_id AND  
     s3.submission_date < s1.submission_date) = dateDIFF(s1.submission_date , '2016-03-01')) ,

        (select hacker_id  from submissions s2 
         where s2.submission_date = s1.submission_date 
           group by hacker_id 
         order by count(submission_id) desc , hacker_id limit 1) as hack,
    (select name from hackers where hacker_id = hack)
    from 
    (select distinct submission_date from submissions) s1
    group by submission_date;

答案 3 :(得分:0)

检查rextester.com下一个查询:

WITH
  a AS
  (
    SELECT
      submission_date,
      hacker_id,
      COUNT(*) AS submissions_by_hacker,
      DENSE_RANK() OVER (ORDER BY submission_date) AS sequence_number_by_date,
      DENSE_RANK() OVER
      (
        PARTITION BY hacker_id ORDER BY submission_date
      ) AS sequence_number_by_hacker,
      RANK() OVER
      (
        PARTITION BY submission_date ORDER BY count(*) DESC
      ) AS rank_by_hacker_submissions
    FROM #submissions
    GROUP BY submission_date, hacker_id
  ),
  b AS
  (
    SELECT
      *,
      MIN(IIF(rank_by_hacker_submissions = 1, hacker_id, NULL)) OVER
      (
        PARTITION BY submission_date
      ) AS min_hacker_id
    FROM a
  )
SELECT
  b.submission_date,
  h.hacker_id,
  COUNT(*) AS quantity_of_hackers_who_made_at_least_submission_each_day,
  h.name AS hacker_name
FROM b JOIN #hackers AS h ON b.min_hacker_id = h.hacker_id
WHERE b.sequence_number_by_date = b.sequence_number_by_hacker
GROUP BY b.submission_date, h.hacker_id, h.name
ORDER BY b.submission_date, h.hacker_id;

输出:

+---------------------+-----------+-----------------------------------------------------------+-------------+
|   submission_date   | hacker_id | quantity_of_hackers_who_made_at_least_submission_each_day | hacker_name |
+---------------------+-----------+-----------------------------------------------------------+-------------+
| 01.03.2016 00:00:00 |     20703 |                                                         4 | Angela      |
| 02.03.2016 00:00:00 |     79722 |                                                         2 | Michael     |
| 03.03.2016 00:00:00 |     20703 |                                                         2 | Angela      |
| 04.03.2016 00:00:00 |     20703 |                                                         2 | Angela      |
| 05.03.2016 00:00:00 |     36396 |                                                         1 | Frank       |
| 06.03.2016 00:00:00 |     20703 |                                                         1 | Angela      |
+---------------------+-----------+-----------------------------------------------------------+-------------+

答案 4 :(得分:0)

select big_1.submission_date, big_1.hkr_cnt, big_2.hacker_id, h.name
from
(select submission_date, count(distinct hacker_id) as hkr_cnt
from 
(select s.*, dense_rank() over(order by submission_date) as date_rank, 
dense_rank() over(partition by hacker_id order by submission_date) as hacker_rank 
from submissions s ) a 
where date_rank = hacker_rank 
group by submission_date) big_1 
join 
(select submission_date,hacker_id, 
rank() over(partition by submission_date order by sub_cnt desc, hacker_id) as max_rank 
from (select submission_date, hacker_id, count(*) as sub_cnt 
from submissions 
group by submission_date, hacker_id) b ) big_2
on big_1.submission_date = big_2.submission_date and big_2.max_rank = 1 
join hackers h on h.hacker_id = big_2.hacker_id 
order by 1;

答案 5 :(得分:0)

尝试以下简单查询。使用下面提供的示例数据进行测试

--- This CTE pulls the unique hackers who made atleast 1 submission per day
WITH cte_c(submission_date,hacker_id) AS
(
SELECT submission_date,hacker_id FROM Submissions WHERE  submission_date = '2020-03-01'
UNION ALL
SELECT A.submission_date,A.hacker_id FROM Submissions A
JOIN cte_c B ON A.submission_date = DATEADD(dd,1,B.submission_date) and A.hacker_id = B.hacker_id
WHERE A.submission_date > '2020-03-01'
)
-- This CTE gives the hackers who made maximum submissions each day and assigns rank 1 to min(hacker_id)
,cte_h as
(
SELECT submission_date,hacker_id, ROW_NUMBER()OVER(PARTITION BY submission_date ORDER BY COUNT(*) DESC, hacker_id) rnk
FROM Submissions
GROUP BY submission_date,hacker_id
)
SELECT c.submission_date,c.hackers_per_day,h.hacker_id,ha.name 
FROM (SELECT submission_date, COUNT(DISTINCT hacker_id) as hackers_per_day FROM cte_c GROUP BY submission_date) C
JOIN cte_h H on c.submission_date = H.submission_date  and rnk = 1--and c.hacker_id = h.hacker_id
JOIN Hackers ha  ON h.hacker_id = ha.hacker_id
ORDER BY c.submission_date
------- Sample Data ---------------------------------------
create table #Hackers
(
hacker_id int,
name varchar(10)
)

create table #Submissions
(submission_date date,
hacker_id int)

insert into Hackers Values(1,'Test1'),(2,'Test2'),(3,'Test3'),(4,'Test4'),(5,'Test5')
insert into Submissions Values('2016-03-01',1),('2016-03-01',2),('2016-03-01',3),('2016-03-01',4),
('2016-03-02',2),('2016-03-02',2),('2016-03-02',3),('2016-03-02',4),('2016-03-02',3),
('2016-03-03',5),('2016-03-03',1),('2016-03-03',2),('2016-03-03',4),('2016-03-03',1),
('2016-03-04',1),('2016-03-04',2),('2016-03-04',5),('2016-03-04',2)