对不同的值,联合结果进行迭代子采样

时间:2018-06-01 19:48:23

标签: sql sql-server tsql

我做了SQL fiddle here

我有一个每行有一个表:一个类别,一个文档ID和一个排名。

这些类别在其内部排名。对于每个类别,我想选择一个子样本。所有子样本应堆叠在一起。

我想通过迭代地获取该类别中的减半行索引来进行子样本处理。如果给定的类别有32个项目,那么我想获取行32,16,8,4,2,1。

在我SQL fiddle我能够针对某个特定类别执行此操作,但我无法弄清楚如何:

a)为[主要焦点区域]中的所有类别执行此操作 b)将得到的子样本合并为一个表

非常感谢任何提示或帮助!我在TSQL(MS SQL Server)

工作

样本数据(MS Sql):

CREATE TABLE Rank_MajorAreas
    ([Rank] int, [Major Focus Area] varchar(17), [ID] int)
;

INSERT INTO Rank_MajorAreas
    ([Rank], [Major Focus Area], [ID])
VALUES
    (1, 'Welfare', 71366),
    (2, 'Welfare', 70415),
    (3, 'Truck Driving', 70423),
    (4, 'Peasant''s Office', 74566),
    (5, 'Peasant''s Office', 71560),
    (6, 'Nail Therapy', 77497),
    (7, 'Truck Driving', 76193),
    (8, 'Truck Driving', 79226),
    (9, 'Truck Driving', 70222),
    (10, 'Welfare', 77336),
    (11, 'Truck Driving', 70823),
    (12, 'Welfare', 77096),
    (13, 'Welfare', 71335),
    (14, 'Nail Therapy', 73551),
    (15, 'Welfare', 72146),
    (16, 'Truck Driving', 74023),
    (17, 'Welfare', 71546),
    (18, 'Nail Therapy', 74755),
    (19, 'Peasant''s Office', 77834),
    (20, 'Welfare', 75667),
    (21, 'Peasant''s Office', 71342),
    (22, 'Peasant''s Office', 77457),
    (23, 'Peasant''s Office', 77923),
    (24, 'Welfare', 76508),
    (25, 'Welfare', 75714),
    (26, 'Welfare', 73654),
    (27, 'Welfare', 75753),
    (28, 'Truck Driving', 71481),
    (29, 'Truck Driving', 79424),
    (30, 'Peasant''s Office', 76143),
    (31, 'Truck Driving', 74076),
    (32, 'Nail Therapy', 78714),
    (33, 'Nail Therapy', 79924),
    (34, 'Welfare', 71482),
    (35, 'Welfare', 70050),
    (36, 'Welfare', 76053),
    (37, 'Nail Therapy', 79591),
    (38, 'Peasant''s Office', 75197),
    (39, 'Nail Therapy', 74104),
    (40, 'Welfare', 72891),
    (41, 'Truck Driving', 73621),
    (42, 'Peasant''s Office', 71713),
    (43, 'Welfare', 71979),
    (44, 'Peasant''s Office', 71601),
    (45, 'Peasant''s Office', 73928),
    (46, 'Nail Therapy', 71759),
    (47, 'Nail Therapy', 70379),
    (48, 'Welfare', 71215),
    (49, 'Truck Driving', 70908),
    (50, 'Welfare', 71989)
;

到目前为止的代码:

CREATE VIEW MFA AS
  SELECT ROW_NUMBER() OVER(ORDER BY fa.[Rank] ASC) AS Row
        ,*
  FROM Rank_MajorAreas AS fa
  -- ideally we could make a view per Focus Area
  WHERE fa.[Major Focus Area] = 'Welfare'
  ORDER BY Row ASC
  OFFSET 0 ROWS;

DECLARE @start int
SELECT @start = (SELECT COUNT(*) FROM MFA)

;WITH Sample( Row ) AS
(
  Select @start as Row
    UNION ALL
  SELECT ROUND(Row/2, 0)
    FROM Sample
    WHERE Row > 0
)
SELECT * FROM MFA AS mfa
INNER JOIN Sample AS s on s.Row = mfa.Row
ORDER BY mfa.Row ASC

所需结果,每个焦点区域都进行二次采样,子样本作为单个结果一起返回

Row Rank    Major Focus Area    ID
1   1   Welfare 71366   
2   2   Welfare 70415   
4   12  Welfare 77096   
9   24  Welfare 76508   
19  50  Welfare 71989   
...
1   6   Nail Therapy    77497
2   14  Nail Therapy    73551
4   32  Nail Therapy    78714
9   47  Nail Therapy    7037

1 个答案:

答案 0 :(得分:1)

您需要在PARTITION BY子句的Major Focus Area列上使用OVER。以下是修改后的TSQL

CREATE VIEW MFA AS
  SELECT ROW_NUMBER() OVER(PARTITION BY fa.[Major Focus Area] ORDER BY fa.[Rank] ASC) AS Row
        ,*
  FROM Rank_MajorAreas AS fa
  -- ideally we could make a view per Focus Area
  ORDER BY [Major Focus Area], Row ASC
  OFFSET 0 ROWS;

DECLARE @start int
SELECT @start = (SELECT COUNT(*) FROM MFA)

;WITH Sample( Row, fa ) AS
(
  Select COUNT(*) as Row, [Major Focus Area] as fa  FROM MFA GROUP BY [Major Focus Area]
    UNION ALL
  SELECT ROUND(Row/2, 0), fa
    FROM Sample
    WHERE Row > 0
)

SELECT mfa.Row, mfa.Rank, mfa.[Major Focus Area] FROM MFA AS mfa
 INNER JOIN Sample AS s on s.Row = mfa.Row and s.fa=mfa.[Major Focus Area]
 ORDER BY [Major Focus Area], mfa.Row ASC

SQL fiddle