列出每组价格第二大的商品

时间:2013-12-31 00:17:17

标签: sql db2 greatest-n-per-group

我已经附上了这个问题的表格关系。我需要为每个Set ID找到价格第二高的商品。这似乎很难;有人能帮助我吗?

  1. set_id是Sets表的主键。
  2. dset_id是Dset的主键。
  3. eff_dt_id是Eff_dt表的主键。
  4. set_id是Dset表中的外键,引用Sets表的set_id。
  5. dset_id是Eff_dt和Dset_data_asgn表中的外键,引用了Dset表。
  6. inst_id是Biz_tbl的主键。
  7. inst_id是Dset_data_asgn表中的外键,引用Biz_tbl。
  8. (dset_id,inst_id)是Dset_data_asgn的复合主键。
  9. Table structure

2 个答案:

答案 0 :(得分:4)

如果您的DB2版本支持窗口函数,您可以大大简化@ Jonathan的答案:

SELECT Sets.setName, Node.node_name, Ordered.data_id as menuItem, Ordered.price
FROM (SELECT DSet.set_id, DSet_Data_Asgn.data_id, Blz_Tbl.price, 
             ROW_NUMBER() OVER(PARTITION BY DSet.set_id ORDER BY Blz_Tbl.price DESC) as rn
      FROM DSet
      JOIN DSet_Data_Asgn
        ON DSet_Data_Asgn.DSet_id = DSet.DSet_id
      JOIN Blz_Tbl
        ON Blz_Tbl.Inst_id = DSet_Data_Asgn.Inst_id) Ordered
JOIN Sets
  ON Sets.set_id = Ordered.set_id
JOIN Node
  ON Node.node_id = Sets.node_id
WHERE Ordered.rn = 2
ORDER BY Node.node_name DESC

(有一个SQL Fiddle example;它使用的是SQL Server,但语法是相同的。)

结果是:

setname node_name menuitem price
set1    US        m2        2.98
set2    Chicago   m1        2

(感谢架构设置脚本Jonathan--这让我的生活更轻松。)

答案 1 :(得分:3)

您需要在数据中选择小于实际最大值的最大值。我们可以预测,有几个涉及MAX的子查询,因此。

此外,由于这是一个非常复杂的查询,我们可以应用TDQD - 测试驱动的查询设计 - 分阶段解决问题。

TDQD - 测试驱动的查询设计

第1步:一般查看已连接的表

第一步是加入五个表中的所有数据,只是为了了解数据以及联接的工作方式:

SELECT S.Set_ID,
       S.SetName,
       S.Node_ID,
       N.Node_Name,
       D.Dset_ID,
       A.Data_ID,
       B.Inst_ID,
       B.Price
  FROM Sets           AS S
  JOIN Node           AS N ON S.Node_ID = N.Node_ID
  JOIN Dset           AS D ON S.Set_ID  = D.Set_ID
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
 ORDER BY S.Set_ID, B.Price;

输出:

 set_id setname  node_id node_name   dset_id data_id inst_id   price
      1 set1           1 US              101 m1          301    2.00
      1 set1           1 US              101 m2          302    2.15
      1 set1           1 US              102 m1          304    2.25
      1 set1           1 US              103 m1          305    2.50
      1 set1           1 US              104 m1          306    2.85
      1 set1           1 US              104 m2          307    2.98    *
      1 set1           1 US              101 m3          303    3.00
      2 set2           2 Chicago         105 m1          308    1.00
      2 set2           2 Chicago         105 m1          309    2.00    *
      2 set2           2 Chicago         106 m2          310    3.00

我们可以看到,我们希望从最后标有*的两行中选择数据。

第2步:查找每个设置ID的最高价格

SELECT D.Set_ID, MAX(B.Price) AS Price
  FROM Dset           AS D
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
 GROUP BY D.Set_ID
 ORDER BY D.Set_ID;

输出:

     set_id   price
          1    3.00
          2    3.00

第3步:找到每个Set_ID的第二个最高价格

我们需要将以前的查询用作子查询,并将该结果与非常相似的查询结合起来,从而导致:

SELECT D.Set_ID, MAX(B.Price) AS Price
  FROM Dset           AS D
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
  JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
          FROM Dset           AS D
          JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
          JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
         GROUP BY D.Set_ID
       ) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
 GROUP BY D.Set_ID
 ORDER BY D.Set_ID;

输出:

     set_id   price
          1    2.98
          2    2.00

我们现在已经获得了Set_ID和第二个最高价格;我们只需要收集其他信息。实际上,我们需要将之前的查询视为(另一个)子查询。

步骤4:收集结果的ID值和其他数据

SELECT S.Set_ID,
       S.SetName,
       S.Node_ID,
       N.Node_Name,
       D.Dset_ID,
       A.Data_ID,
       B.Inst_ID,
       B.Price
  FROM Sets           AS S
  JOIN Node           AS N ON S.Node_ID = N.Node_ID
  JOIN Dset           AS D ON S.Set_ID  = D.Set_ID
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
  JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
          FROM Dset           AS D
          JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
          JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
          JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
                  FROM Dset           AS D
                  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
                  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
                 GROUP BY D.Set_ID
               ) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
         GROUP BY D.Set_ID
        ) AS X ON X.Set_ID = S.Set_ID AND X.Price = B.Price
 ORDER BY S.Set_ID;

输出:

     set_id setname  node_id node_name   dset_id data_id inst_id   price
          1 set1           1 US              104 m2          307    2.98
          2 set2           2 Chicago         105 m1          309    2.00

此数据与所需输出匹配,但包括实际不需要的各种ID列。所以最后一步是从SELECT语句中删除这些列,导致

最终查询

SELECT S.SetName,
       N.Node_Name,
       A.Data_ID AS MenuItem,
       B.Price
  FROM Sets           AS S
  JOIN Node           AS N ON S.Node_ID = N.Node_ID
  JOIN Dset           AS D ON S.Set_ID  = D.Set_ID
  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
  JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
          FROM Dset           AS D
          JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
          JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
          JOIN (SELECT D.Set_ID, MAX(B.Price) AS Price
                  FROM Dset           AS D
                  JOIN Dset_Data_Asgn AS A ON D.Dset_ID = A.Dset_ID
                  JOIN Blz_Tbl        AS B ON A.Inst_ID = B.Inst_ID
                 GROUP BY D.Set_ID
               ) AS M ON D.Set_ID = M.Set_ID AND B.Price < M.Price
         GROUP BY D.Set_ID
        ) AS X ON X.Set_ID = S.Set_ID AND X.Price = B.Price
 ORDER BY S.SetName;

最终输出

setname node_name menuitem price
set1    US        m2        2.98
set2    Chicago   m1        2.00

摘要

开发的关键点是逐步构建和测试查询。最困难的一步是第3步。需要一些练习才能设计出类似的查询,但是几年后(大约20年左右)它就变成了第二天性。但复杂查询的逐步细化(或测试驱动开发)是必要的。您如图所示测试每个步骤,以确保答案符合您的预期。如果它不正确,您可以修改当前查询,或者如果您发现它们不是您所需要的,那么您可以返回并修改先前的查询。

我确实在这些不同的阶段创建了最终查询。我不会考虑这样做。您可以搜索TDQD(可选地在标记[sql]中),您将看到逐步开发复杂查询的其他示例。

用于创建和加载架构的SQL

CREATE TABLE Node
(
    Node_ID     INTEGER NOT NULL PRIMARY KEY,
    Node_Name   CHAR(7) NOT NULL
);

CREATE TABLE Sets
(
    Set_ID  INTEGER NOT NULL PRIMARY KEY,
    SetName CHAR(4) NOT NULL UNIQUE,
    Mkt_ID  INTEGER NOT NULL,
    Node_ID INTEGER NOT NULL REFERENCES Node(Node_ID)
);

CREATE TABLE Dset
(
    Dset_ID INTEGER NOT NULL PRIMARY KEY,
    Set_ID  INTEGER NOT NULL REFERENCES Sets(Set_ID),
    Dltd_Fl INTEGER NOT NULL
);

CREATE TABLE Blz_Tbl
(
    Inst_ID INTEGER NOT NULL PRIMARY KEY,
    Price   DECIMAL(5,2) NOT NULL
);

CREATE TABLE Dset_Data_Asgn
(
    Dset_ID INTEGER NOT NULL REFERENCES Dset(Dset_ID),
    Inst_ID INTEGER NOT NULL REFERENCES Blz_Tbl(Inst_ID),
    PRIMARY KEY(Dset_ID, Inst_ID),
    Data_ID CHAR(2) NOT NULL
);

INSERT INTO Node VALUES(1, 'US');
INSERT INTO Node VALUES(2, 'Chicago');
INSERT INTO Node VALUES(3, 'Florida');

INSERT INTO Sets VALUES(1, 'set1', 1, 1);
INSERT INTO Sets VALUES(2, 'set2', 1, 2);

INSERT INTO Dset VALUES(101, 1, 0);
INSERT INTO Dset VALUES(102, 1, 0);
INSERT INTO Dset VALUES(103, 1, 0);
INSERT INTO Dset VALUES(104, 1, 0);
INSERT INTO Dset VALUES(105, 2, 0);
INSERT INTO Dset VALUES(106, 2, 0);

INSERT INTO Blz_Tbl VALUES(301, 2.00);
INSERT INTO Blz_Tbl VALUES(302, 2.15);
INSERT INTO Blz_Tbl VALUES(303, 3.00);
INSERT INTO Blz_Tbl VALUES(304, 2.25);
INSERT INTO Blz_Tbl VALUES(305, 2.50);
INSERT INTO Blz_Tbl VALUES(306, 2.85);
INSERT INTO Blz_Tbl VALUES(307, 2.98);
INSERT INTO Blz_Tbl VALUES(308, 1.00);
INSERT INTO Blz_Tbl VALUES(309, 2.00);
INSERT INTO Blz_Tbl VALUES(310, 3.00);

INSERT INTO Dset_Data_Asgn VALUES(101, 301, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(101, 302, 'm2');
INSERT INTO Dset_Data_Asgn VALUES(101, 303, 'm3');
INSERT INTO Dset_Data_Asgn VALUES(102, 304, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(103, 305, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(104, 306, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(104, 307, 'm2');
INSERT INTO Dset_Data_Asgn VALUES(105, 308, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(105, 309, 'm1');
INSERT INTO Dset_Data_Asgn VALUES(106, 310, 'm2');

如果架构和数据是由提出问题的人提供的,而不是由回答问题的人写的,那就太好了!