邮编/重复加入?

时间:2016-10-30 13:08:11

标签: sql tsql

假设我有一个带有类型列的简单文档表:

Documents
Id  Type
1   A
2   A
3   B
4   C
5   C
6   A
7   A
8   A
9   B
10  C

用户有权访问不同类型的文档:

Permissions
Type    User
A       John
A       Jane
B       Sarah
C       Peter
C       John
C       Mark

我需要在用户之间分发这些文档作为任务:

Tasks
Id  T DocId UserId
1   A   1   John
2   A   2   Jane
3   B   3   Sarah
4   C   4   Peter
5   C   5   John
6   A   6   John
7   A   7   Jane
8   A   8   John
9   B   9   Sarah
10  C   10  Mark

我该怎么做?我如何获得任务?

2 个答案:

答案 0 :(得分:2)

您可以枚举行,然后使用模运算进行匹配:

with d as (
      select d.*,
             row_number() over (partition by type order by newid()) as seqnum,
             count(*) over (partition by type) as cnt
      from documents d
     ),
      u as (
       select u.*,
              row_number() over (partition by type order by newid()) as seqnum,
              count(*) over (partition by type) as cnt
       from users u
      )
select d.*
from d join
     u
     on d.type = u.type and
        u.seqnum = (d.seqnum % u.cnt) + 1

答案 1 :(得分:0)

好问题。

  • 此解决方案返回所有可能的分配,按优先级排序,由所涉及的用户数每个用户的最低文档数标准偏差等信息确定每个用户的任务等。
  • 我不指望document.id是一个从1开始的数字序列,因此使用dense_rank。
  • 解决方案的核心是迭代CTE,它生成所有可能分布的记录集。
  • 笔记本电脑上的执行时间约为20秒(迭代部分需要5秒)
with        doc_user    as 
            (
                select          d."id"                                          as docid
                               ,p."user"                                        as userid
                               ,dense_rank  ()  over (order by      d."id")     as doc_seq

                from                documents       d

                        left join   permissions     p

                        on          p.type = d.type
            )

           ,it_cte  as 
            (
                select      docid
                           ,userid
                           ,doc_seq
                           ,cast (coalesce(userid,'') as varchar(max))  as path
                           ,'A'                                         as cte_part

                from        doc_user    

                where       doc_seq = 1


                union all

                select      r.docid
                           ,r.userid
                           ,du.doc_seq
                           ,r.path + ',' + coalesce (du.userid,'')
                           ,'B'


                from                    it_cte      as r

                            cross join  doc_user    as du

                where       du.doc_seq = r.doc_seq + 1


                union all

                select      du.docid
                           ,du.userid
                           ,du.doc_seq
                           ,r.path + ',' + coalesce (du.userid,'')
                           ,'C'


                from                    it_cte      as r

                            cross join  doc_user    as du

                where       du.doc_seq = r.doc_seq + 1
                        and r.cte_part in ('A','C') 
            )

           ,result_sets as
            (
                select      dense_rank  () over (order by path) as set_id   
                           ,docid
                           ,userid

                from        it_cte

                where       doc_seq = (select count(*) from documents)
            )

           ,result_sets_stat as
            (
                select      set_id
                           ,count   (distinct userid)   as users_involved

                from        result_sets

                group by    set_id
            )

           ,result_sets_users_stat as
            (
                select      set_id
                           ,min     (doc)   min_doc_per_user
                           ,stdevp  (doc)   stdevp_doc_per_user

                from       (select      set_id
                                       ,userid
                                       ,count   (*) as doc

                            from        result_sets

                            group by    set_id
                                       ,userid
                            ) t

                group by    set_id
            )

select      s.set_priority
           ,r.docid
           ,r.userid
           ,s.users_involved        
           ,s.min_doc_per_user              
           ,s.stdevp_doc_per_user       

from                    (select     s.set_id
                                   ,s.users_involved        
                                   ,u.min_doc_per_user              
                                   ,u.stdevp_doc_per_user   

                                   ,row_number () over      
                                    (
                                        order by    s.users_involved        desc
                                                   ,u.min_doc_per_user      desc
                                                   ,u.stdevp_doc_per_user   
                                                   ,s.set_id  

                                    )   as set_priority

                        from                    result_sets_stat        as s

                                    join        result_sets_users_stat  as u

                                    on          u.set_id    =
                                                s.set_id
                        ) s

            join        result_sets as r

            on          r.set_id    =
                        s.set_id    

order by    s.set_priority 
           ,r.docid

option      (merge join) 
;