假设我们有一个带有'A'列的表,其值为0到N.我想为每列具有相同值的30%选择“A”列。
So if I have this:
A| B
-------
0 hello
0 test
0 hi
1 blah1
1 blah2
1 blah3
1 blah4
1 blah5
1 blah6
Result:
A| B
-------
0 hello
1 blah1
1 blah4
它可能是blah1或任何其他不是blah4的blah,而blah4可能是任何其他不是blah1的blah,基本上它可能是随机的或跳过。
顺便说一下,实际的桌子是巨大的,谈论太字节,所以想想性能。
答案 0 :(得分:6)
尝试这样的事情:
DECLARE @YourTable table (A int, b varchar(10))
INSERT @YourTable VALUES (0, 'hello') --OP's data
INSERT @YourTable VALUES (0, 'test')
INSERT @YourTable VALUES (0, 'hi')
INSERT @YourTable VALUES (1, 'blah1')
INSERT @YourTable VALUES (1, 'blah2')
INSERT @YourTable VALUES (1, 'blah3')
INSERT @YourTable VALUES (1, 'blah4')
INSERT @YourTable VALUES (1, 'blah5')
INSERT @YourTable VALUES (1, 'blah6')
;WITH NumberedRows AS
( SELECT
A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY A,B) AS RowNumber
FROM @YourTable
)
, GroupCounts AS
( SELECT
A,MAX(RowNumber) AS MaxA
FROM NumberedRows
GROUP BY A
)
SELECT
n.a,n.b
FROM NumberedRows n
INNER JOIN GroupCounts c ON n.A=c.A
WHERE n.RowNUmber<=(c.MaxA+1)*0.3
输出:
a b
----------- ----------
0 hello
1 blah1
1 blah2
(3 row(s) affected)
编辑基于Andriy M的评论中的好主意
;WITH NumberedRows AS
( SELECT
A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY A,B) AS RowNumber
,COUNT(*) OVER (PARTITION BY A) AS TotalOf
FROM @YourTable
)
SELECT
n.a,n.b
FROM NumberedRows n
WHERE n.RowNumber<=(n.TotalOf+1)*0.3
ORDER BY A
输出:
a b
----------- ----------
0 hello
1 blah1
1 blah2
(3 row(s) affected)
编辑这里是“随机”行,使用Andriy M的想法:
DECLARE @YourTable table (A int, b varchar(10))
INSERT @YourTable VALUES (0, 'hello') --OP's data
INSERT @YourTable VALUES (0, 'test')
INSERT @YourTable VALUES (0, 'hi')
INSERT @YourTable VALUES (1, 'blah1')
INSERT @YourTable VALUES (1, 'blah2')
INSERT @YourTable VALUES (1, 'blah3')
INSERT @YourTable VALUES (1, 'blah4')
INSERT @YourTable VALUES (1, 'blah5')
INSERT @YourTable VALUES (1, 'blah6')
;WITH NumberedRows AS
( SELECT
A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY newid()) AS RowNumber
FROM @YourTable
)
, GroupCounts AS (SELECT A,COUNT(A) AS MaxA FROM NumberedRows GROUP BY A)
SELECT
n.A,n.B
FROM NumberedRows n
INNER JOIN GroupCounts c ON n.A=c.A
WHERE n.RowNUmber<=(c.MaxA+1)*0.3
ORDER BY n.A
输出:
a b
----------- ----------
0 hi
1 blah3
1 blah6
(3 row(s) affected)
答案 1 :(得分:1)
这只使用一个子查询,因此只能通过你的集合。
SELECT a
, b
FROM
(
SELECT A
, b
, ROW_NUMBER()
OVER( PARTITION BY A
ORDER BY b
) r
, COUNT(b)
OVER( PARTITION BY A
) ct
FROM @YourTable
) n
WHERE n.r <= n.ct * 0.3
就像这样,虽然如果少于10个并且“额外”被发布到第一个箱子,它总是返回前3个。:
SELECT A
, b
FROM
(
SELECT A
, b
, NTILE(10)
OVER( PARTITION BY a
ORDER BY b
) tens
FROM @YourTable
) n
WHERE tens <= 3;