我有一个使用Big Query中的PERCENT_RANK()函数生成的大百分位表。表输出生成许多行数据,这些数据的百分位数彼此非常接近。我期待只返回10行,其值为第100,第90,第80,第70等百分位。
更具体地说,我正在寻找最接近第80百分位数(.8)的数字并具有以下样本值:
0.81876543 0.81123141 0.80121214 0.80012123 0.80001213 0.80001112 0.79999121
在这种情况下.80001112最接近.8。
我可以使用的SQL函数只返回最接近那些百分位数的十个值。
答案 0 :(得分:1)
下面的示例适用于BigQuery Standard SQL
#standardSQL
WITH `project.dataset.percentiles` AS (
SELECT .81876543 percentile UNION ALL
SELECT .81123141 UNION ALL
SELECT .80121214 UNION ALL
SELECT .80012123 UNION ALL
SELECT .80001213 UNION ALL
SELECT .80001112 UNION ALL
SELECT .79999121
), targets AS (
SELECT check
FROM UNNEST([1, .9, .8, .7, .6, .5, .4, .3, .2, .1]) check
)
SELECT check, ARRAY_AGG(percentile ORDER BY ABS(percentile - check) LIMIT 10) val
FROM `project.dataset.percentiles`
CROSS JOIN targets
WHERE ABS(percentile - check) < .05
GROUP BY check
ORDER BY check
上面的为每个百分位数提供10个最接近的值 - 100%,90%80%等
如果您每个只需要一个 - 您可以查看以下查询
#standardSQL
WITH `project.dataset.percentiles` AS (
SELECT .81876543 percentile UNION ALL
SELECT .81123141 UNION ALL
SELECT .80121214 UNION ALL
SELECT .80012123 UNION ALL
SELECT .80001213 UNION ALL
SELECT .80001112 UNION ALL
SELECT .79999121
), targets AS (
SELECT check
FROM UNNEST([1, .9, .8, .7, .6, .5, .4, .3, .2, .1]) check
)
SELECT check, ARRAY_AGG(percentile ORDER BY ABS(percentile - check) LIMIT 1)[SAFE_OFFSET(0)] val
FROM `project.dataset.percentiles`
CROSS JOIN targets
WHERE ABS(percentile - check) < .05
GROUP BY check
ORDER BY check