我想获取一列数据的百分比分布。我的查询看起来像这样
#StandardSQL
SELECT
PERCENTILE_CONT(age, 0) OVER() AS min,
PERCENTILE_CONT(age, 0.05) OVER() AS percentile5,
PERCENTILE_CONT(age, 0.25) OVER() AS percentile25,
PERCENTILE_CONT(age, 0.50) OVER() AS percentile50,
PERCENTILE_CONT(age, 0.75) OVER() AS percentile75,
PERCENTILE_CONT(age, 0.95) OVER() AS percentile95,
PERCENTILE_CONT(age, 1) OVER() AS max
FROM `data`
但是我一直遇到错误
The query could not be executed in the allotted memory.
OVER() operator used too much memory..
我也曾尝试像
那样一次运行一行select PERCENTILE_CONT(age, 0.05) OVER() AS percentile5
from data
但是这也会产生相同的错误。
我的表有3000万行。 有什么方法可以对其进行优化?
谢谢。
答案 0 :(得分:2)
大概age
并没有很多值。如果是这样,您可以汇总数据,然后执行所需的操作。
例如:
select min(age) as min,
max(case when running_cnt - cnt < 0.05 * cnt
then age
end) as percentile_05
max(case when running_cnt - cnt < 0.5 * cnt
then age
end) as percentile_50
max(age) as max
from (select age, count(*) as cnt,
sum(count(*)) over (order by age) as running_cnt,
sum(count(*)) over () as total_cnt
from `data`
group by age
) d
答案 1 :(得分:0)
我将对您的数据进行排序,然后手动计算百分等级。如果需要插值,也可以手动完成...
{u'count': 5,
u'dbTimeCost': 11,
u'hasmore': False,
u'result': [{u'_oid': u'555e262fe4b059c7fbd6af72',
u'_type': u'Compute',
u'label': u'lvs3b01c-ea7c.stratus.lvs.ebay.com'},
{u'_oid': u'555e27d8e4b059c7fbd6bab9',
u'_type': u'Compute',
u'label': u'lvs3b01c-9073.stratus.lvs.ebay.com'},
{u'_oid': u'555e27c9e4b059c7fbd6ba7e',
u'_type': u'Compute',
u'label': u'lvs3b01c-b14b.stratus.lvs.ebay.com'},
{u'_oid': u'555e2798e4b0800601a83b0f',
u'_type': u'Compute',
u'label': u'lvs3b01c-6ae2.stratus.lvs.ebay.com'},
{u'_oid': u'555e2693e4b087582f108200',
u'_type': u'Compute',
u'label': u'lvs3b01c-a228.stratus.lvs.ebay.com'}],
u'status': {u'code': u'200', u'msg': u'ok', u'stackTrace': None},
u'totalTimeCost': 12}