我试图在python中创建一个for循环以将其连接到Snowflake,因为Snowflake不支持循环。 我想从不同的AgeGroups中选择一些随机行。例如。年龄组“ 30-40”中的1500行,年龄组“ 40-50”中的1200行,年龄组“ 50-60”中的875行。
是否有任何想法或雪花中循环的替代方法?
答案 0 :(得分:2)
您是否看过Snowflake的存储过程?它们是Javascript,可让您在Snowflake中进行本地循环:
https://docs.snowflake.net/manuals/sql-reference/stored-procedures-overview.html
答案 1 :(得分:0)
如果要从每个组中抽取n
个随机样本,则可以创建一个子查询,该子查询包含一个在每个组中随机分布的行号,然后从每个组中选择前n
行。
如果您有一个这样的表:
USER DATE
1 2018-11-04
1 2018-11-04
1 2018-12-07
1 2018-10-09
1 2018-10-09
1 2018-11-07
1 2018-11-09
1 2018-11-09
2 2019-11-02
2 2019-10-02
2 2019-11-03
2 2019-11-06
3 2019-11-10
3 2019-11-13
3 2019-11-15
此查询可用于为用户2和3返回两个随机行,为用户1返回3个随机行。
SELECT User, Date
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY User ORDER BY RANDOM()) as random_row
FROM Users)
WHERE
(User = 3 AND random_row < 3) OR
(User = 2 AND random_row < 3) OR
(User = 1 AND random_row < 4);
因此,在您的情况下,分区并过滤age_group
而不是User
。
答案 2 :(得分:0)
雪花支持随机和确定性表采样。例如:
返回表的样本,其中每一行都有10%的概率包含在样本中:
SELECT * FROM testtable SAMPLE (10);
https://docs.snowflake.net/manuals/sql-reference/constructs/sample.html
答案 3 :(得分:0)
“雪花没有循环”是什么意思?如果您能找到SQL,它们就有“循环” ...
以下查询可满足您的要求:
WITH POPULATION AS ( /* 10,000 persons with random age 0-100 */
SELECT 'Person ' || SEQ2() ID, ABS(RANDOM()) % 100 AGE
FROM TABLE(GENERATOR(ROWCOUNT => 10000))
)
SELECT
ID,
AGE,
CASE
WHEN AGE < 30 THEN '0-30'
WHEN AGE < 40 THEN '30-40'
WHEN AGE < 50 THEN '40-50'
WHEN AGE < 60 THEN '50-60'
ELSE '60-100'
END AGE_GROUP,
ROW_NUMBER() OVER (PARTITION BY AGE_GROUP ORDER BY RANDOM()) DRAW_ORDER
FROM POPULATION
QUALIFY DRAW_ORDER <= DECODE(AGE_GROUP, '30-40', 1500, '40-50', 1200, '50-60', 875, 0);
附录:
waldente指出,一种更简单有效的方法是使用SAMPLE
:
WITH
POPULATION_30_40 AS (SELECT * FROM POPULATION WHERE AGE >= 30 AND AGE < 40),
POPULATION_40_50 AS (SELECT * FROM POPULATION WHERE AGE >= 40 AND AGE < 50),
POPULATION_50_60 AS (SELECT * FROM POPULATION WHERE AGE >= 50 AND AGE < 60)
SELECT * FROM POPULATION_30_40 SAMPLE(1500 ROWS) UNION ALL
SELECT * FROM POPULATION_40_50 SAMPLE(1200 ROWS) UNION ALL
SELECT * FROM POPULATION_50_60 SAMPLE(875 ROWS)