寻找一系列活动的连续参加者

时间:2018-07-25 17:49:18

标签: c# sql sql-server

我正在尝试找到仅SQL的解决方案,以解决与计算连续事件出席率有关的问题。这些事件发生在不同的日期,所以我不能使用任何连续的日期方法来确定连续的出勤。要计算一个人的连续出勤率,我将从最近的事件开始,然后回到过去。我会计算该人参加的每个活动,当我打一个人未参加的活动时,我会停下来。这使我能够统计最近连续参加的活动。当前,所有数据都托管在SQL表中,下面是带有数据的示例架构:

USERS
ID  UserName    MinutesWatched
--- --------    --------------
1   jdoe        30
2   ssmith      400
3   bbaker      350
4   tduke       285

EVENTS
ID  Name        StartDate
--  ----------- ---------
1   1st Event   07/15/2018
2   2nd Event   07/16/2018
3   3rd Event   07/18/2018
4   4th Event   07/20/2018

ATTENDANCE
ID  User_ID Event_ID
--  ------- --------
1   1   1
2   1   2
3   1   3
4   1   4
5   2   4
6   2   3
7   3   4
8   3   2
9   3   1
10  4   4
11  4   3
12  4   2

对于输出,我试图获得:

OUTPUT
User_ID Consecutive   WatchedMinutes
------- -----------   --------------
1            4            30
2            2            400
3            1            350
4            3            285

我已经构建了C#代码以迭代的方式进行此操作,但是当我要处理300,000多个用户和数百个事件时,它的速度很慢。我很乐意看到此版本的SQL

以下是根据Dan的要求来计算热门事件观看者的方法。输出实际上只是一个列出了Top X事件查看器的字符串。

public string GetUsersTopWatchedConsecutiveStreams(int topUserCount)
{

    string results = "Top " + topUserCount + " consecutive viewers - ";
    Dictionary<ChatUser, int> userinfo = new Dictionary<ChatUser, int>();

    using (StorageModelContext db = new StorageModelContext())
    {
        IQueryable<ChatUser> allUsers = null;
        if (mainViewModel.CurrentStream != null)
            allUsers = db.ViewerHistory.Include("Stream").Include("User").Where(x => x.Stream.Id == mainViewModel.CurrentStream.Id).Select(x => x.User);
        else
            allUsers = db.ViewerHistory.Include("Stream").Include("User").Where(x => x.Stream.Id == (db.StreamHistory.OrderByDescending(s => s.StreamEnd).FirstOrDefault().Id)).Select(x => x.User);


        foreach (var u in allUsers)
        {
            int totalStreams = 0;
            var user = db.Users.Include("History").Where(x => x.UserName == u.UserName).FirstOrDefault();
            if (user != null)
            {
                var streams = user.History;
                if (streams != null)
                {
                    var allStreams = db.StreamHistory.OrderByDescending(x => x.StreamStart);
                    foreach (var s in allStreams)
                    {
                        var vs = streams.Where(x => x.Stream == s);
                        if (vs.Count() > 0)
                            totalStreams++;
                        else
                            break;
                    }
                }
            }
            userinfo.Add(u, totalStreams);
            totalStreams = 0;
        }

        var top = userinfo.OrderByDescending(x => x.Value).ThenByDescending(x => x.Key.MinutesWatched).Take(topUserCount);
        int cnt = 1;
        foreach (var t in top)
        {
            results += "#" + cnt + ": " + t.Key + "(" + t.Value.ToString() + "), ";
            cnt++;
        }
        if (cnt > 1)
            results = results.Substring(0, results.Length - 2);

    }
    return results;
}
当没有活动运行时,

mainViewModel.CurrentStream为null。直播事件发生时,它将包含一个对象,其中包含与直播事件相关的信息。

2 个答案:

答案 0 :(得分:0)

您似乎想要一个人没有参加的最大活动ID,小于该人参加的最大活动ID。然后,您要计算参加者的人数。

以下方法将其处理为:

  • 将所有事件合并到用户,直到最大事件
  • 获取最大的事件,不匹配
  • 带回计数为0的行并对其进行计数

因此,这使事件具有计数:

select u.user_id,
       sum(case when a.event_id is null then e.id end) over (partition by user_id) as max_nonmatch_event_id
from (select user_id, max(event_id) as max_event_id
      from attendance 
      group by user_id
     ) u join
     events e
     on e.id <= u.max_event_id left join
     attendance a
     on a.user_id = u.id and a.event_id = e.id
order by num_nulls_gt;

剩下的一个子查询应该完成:

select u.user_id, count(*) as num_consecutive
from (select u.user_id,
       sum(case when a.event_id is null then e.id end) over (partition by user_id) as max_nonmatch_event_id
      from (select user_id, max(event_id) as max_event_id
            from attendance 
            group by user_id
           ) u join
           events e
           on e.id <= u.max_event_id left join
           attendance a
           on a.user_id = u.id and a.event_id = e.id
     ) ue
where event_id > max_nonmatch_event_id
group by user_id;

答案 1 :(得分:0)

也许您想尝试一下:

事件以降序(由StartDate获得行号),用户的出勤率也以降序StartDate获得行号。现在,连续出席的事件编号和出席人数的差异将相同。我使用这些差异进行分组,计算组中的出勤人数,并返回差异最小的组(按用户):

WITH
  evt (ID, StartDate, evt_no) AS (
    SELECT ID, StartDate,
      ROW_NUMBER() OVER (ORDER BY StartDate DESC)
    FROM EVENTS
  ),
  att ([User_ID], grp_no) AS (
    SELECT [User_ID], evt_no - 
      ROW_NUMBER() OVER (PARTITION BY [User_ID] ORDER BY StartDate DESC)
    FROM ATTENDANCE a
      INNER JOIN evt ON a.Event_ID = evt.ID
  ),
  con ([User_ID], Consecutive, rn) AS (
    SELECT [User_ID], COUNT(*),
      ROW_NUMBER() OVER (PARTITION BY User_ID ORDER BY grp_no)
    FROM att
    GROUP BY [User_ID], grp_no
  )
SELECT u.ID AS [User_ID], u.UserName, u.MinutesWatched, con.Consecutive
FROM con
  INNER JOIN USERS u ON con.[User_ID] = u.ID
WHERE con.rn = 1;

会对这个查询在您的系统上运行多长时间感兴趣。