Question

我有一个结果集，为简单起见，我将引用三列的表“tab”：Category，Subcategory和Date，按类别排序，然后按日期排序。此数据集是一个网格，我希望在该网格之上执行其他处理。我的问题是在数据集中唯一标识（或顺序标记）组。根据前3列的存在，下面的SQL是我所追求的（GID1或GID2会做）。我尝试过group_id，grouping_id，rank，dense_rank，或者错过了其中一个技巧，或者我正在尝试一些非常尴尬的事情。 GID的顺序并不重要，但重要的是，组号分配基于订购的数据（类别，然后是日期）。

 CREATE TABLE Tab
        ("Category" varchar2(1), "SubCategory" varchar2(7), "Date" int, "GID1" int, "GID2" int);

    INSERT ALL 
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'bannana', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'grape', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120103, 1, 1)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120104, 1, 1)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'bannana', 20120105, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120106, 2, 2)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120107, 2, 2)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'apple', 20120108, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120109, 3, 3)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'apple', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'bannana', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'apple', 20120103, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'bannana', 20120104, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120105, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120106, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120107, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120108, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120109, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120103, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'bannana', 20120104, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120105, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'pear', 20120106, 1, 5)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120107, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120108, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120109, NULL, NULL)
    SELECT * FROM dual
    ;

Answer 1

好吧，如果只是梨子那么：

SQL> select "Category", "SubCategory", "Date",
  2         case
  3           when "SubCategory" = 'pear'
  4           then
  5             count(rn) over (partition by "Category" order by "Date") else null
  6         end GID1 ,
  7         case
  8           when "SubCategory" = 'pear'
  9           then
 10             count(rn) over ( order by "Category", "Date") else null
 11         end GID2
 12    from (select "Category", "SubCategory", "Date", lag("SubCategory") over (partition by "Category" order by "Date"),
 13                                    case
 14                                      when lag("SubCategory") over (partition by "Category" order by "Date") != "SubCategory"
 15                                      and "SubCategory" = 'pear'
 16                                       then 1
 17                                      when row_number() over (partition by "Category" order by "Date") = 1 and "SubCategory" = 'pear' then 1
 18                                      else null
 19                                    end rn
 20                               from tab)
 21   order by 1, 3;

Category   SubCate       Date       GID1       GID2
---------- ------- ---------- ---------- ----------
A          bannana   20120101
A          grape     20120102
A          pear      20120103          1          1
A          pear      20120104          1          1
A          bannana   20120105
A          pear      20120106          2          2
A          pear      20120107          2          2
A          apple     20120108
A          pear      20120109          3          3
B          apple     20120101
B          bannana   20120102
B          apple     20120103
B          bannana   20120104
B          pear      20120105          1          4
B          pear      20120106          1          4
B          pear      20120107          1          4
B          pear      20120108          1          4
B          pear      20120109          1          4
C          grape     20120101
C          grape     20120102
C          apple     20120103
C          bannana   20120104
C          grape     20120105
C          pear      20120106          1          5
C          apple     20120107
C          apple     20120108
C          apple     20120109

打破这种局面。

我们查看按“Date”排序的前一行（对于每个“Category”），看看它是否是一个不同的“SubCategory”，还有当前的cateogry = pear。如果是这样，我们用“1”标记行（与我们使用的无关，只是NON NULL）。

lag("SubCategory") over (partition by "Category" order by "Date") != "SubCategory" 
 and "SubCategory" = 'pear'

也是我们分配的第一行。这给了我们：

Category   SubCate       Date LAG("SU         RN
---------- ------- ---------- ------- ----------
A          bannana   20120101
A          grape     20120102 bannana
A          pear      20120103 grape            1
A          pear      20120104 pear
A          bannana   20120105 pear
A          pear      20120106 bannana          1
A          pear      20120107 pear
A          apple     20120108 pear
A          pear      20120109 apple            1
B          apple     20120101
B          bannana   20120102 apple
B          apple     20120103 bannana
B          bannana   20120104 apple
B          pear      20120105 bannana          1
B          pear      20120106 pear
B          pear      20120107 pear
B          pear      20120108 pear
B          pear      20120109 pear
C          grape     20120101
C          grape     20120102 grape
C          apple     20120103 grape
C          bannana   20120104 apple
C          grape     20120105 bannana
C          pear      20120106 grape            1
C          apple     20120107 pear
C          apple     20120108 apple
C          apple     20120109 apple

现在，我们只计算（）在Date上再次排序的非空“RN”值（GID1的每个类别，而不是GID2 [gid2我们也按它排序！）。这些行是这样的： count(rn) over (partition by "Category" order by "Date")（GID1）

和 count(rn) over ( order by "Category", "Date")（GID2）

Answer 2

从未想过可以通过计数完成。辉煌。从版本11r2开始，这可以通过使用递归分层查询来完成。

with r as ( 
  select "Category"
    , "SubCategory"
    , "Date"
    , row_number() over (partition by "SubCategory" order by "Category", "Date") rn
  from tab
)
, fwd ( "Category", "SubCategory", "Date", rn, GID1, GID2) as (
  select "Category"
    , "SubCategory"
    , "Date"
    , rn
    , 1
    , 1
  from r
  where rn = 1
  union all 
  select nxt."Category"
    , nxt."SubCategory"
    , nxt."Date"
    , nxt.rn
    , decode( nxt."Category"
      , prev."Category", decode( nxt."Date"
        , prev."Date" + 1, prev.gid1
        , prev.gid1 + 1 
      )
      , 1
    ) as gid1
    , decode( nxt."Date"
      , prev."Date" + 1, prev.gid2
      , prev.gid2 + 1 
    ) as gid2
  from fwd prev
    , r nxt
  where prev.rn + 1= nxt.rn
    and prev."SubCategory" = nxt."SubCategory"
)
select "Category"
  , "SubCategory"
  , "Date"
  , decode( "SubCategory", 'pear', GID1, null ) as gid1
  , decode( "SubCategory", 'pear', GID2, null ) as gid2
from fwd
order by "Category", "Date";

它产生相同的结果

Category SubCategory       Date       GID1       GID2
-------- ----------- ---------- ---------- ----------
A        bannana       20120101                       
A        grape         20120102                       
A        pear          20120103          1          1 
A        pear          20120104          1          1 
A        bannana       20120105                       
A        pear          20120106          2          2 
A        pear          20120107          2          2 
A        apple         20120108                       
A        pear          20120109          3          3 
B        apple         20120101                       
B        bannana       20120102                       
B        apple         20120103                       
B        bannana       20120104                       
B        pear          20120105          1          4 
B        pear          20120106          1          4 
B        pear          20120107          1          4 
B        pear          20120108          1          4 
B        pear          20120109          1          4 
C        grape         20120101                       
C        grape         20120102                       
C        apple         20120103                       
C        bannana       20120104                       
C        grape         20120105                       
C        pear          20120106          1          5 
C        apple         20120107                       
C        apple         20120108                       
C        apple         20120109

可以更自我解释。

如果从最终选择中移除decode，它还会为所有其他子类别生成正确的GID1和GID2号码，而不仅仅是'梨'。

在此变体与@DazzaL

提供的变体之间进行选择需要进行成本比较

Oracle SQL顺序组编号分配

2 个答案: