在Redshift中给出以下查询:
select
distinct cast(joinstart_ev_timestamp as date) as session_date,
PERCENTILE_DISC(0.02) WITHIN GROUP (ORDER BY join_time) over(partition by
trunc(joinstart_ev_timestamp))/1000 as mini,
median(join_time) over(partition by trunc(joinstart_ev_timestamp))/1000 as jt,
product_name as product,
endpoint as endpoint
from qe_datawarehouse.join_session_fact
where
cast(joinstart_ev_timestamp as date) between date '2018-01-18' and date '2018-01-30'
and lower(product_name) LIKE 'gotoTest%'
and join_time > 0 and join_time <= 600000 and join_time is not null
and audio_connect_time >= 0
and (entrypoint_access_time >= 0 or entrypoint_access_time is null)
and (panel_connect_time >= 0 or panel_connect_time is null) and version = 'V2'
我需要将上面的Query转换为相应的Presto语法。 我写的相应的Presto查询是:
select
distinct cast(joinstart_ev_timestamp as date) as session_date,
PERCENTILE_DISC( WITHIN GROUP (ORDER BY cast(join_time as double))
over(partition by cast(joinstart_ev_timestamp as date) )/1000 as mini,
approx_percentile(cast(join_time as double),0.50) over (partition by
cast(joinstart_ev_timestamp as date)) /1000 as jt,
product_name as product,
endpoint as endpoint
from datawarehouse.join_session_fact
where
cast(joinstart_ev_timestamp as date) between date '2018-01-18' and date '2018-01-30'
and lower(product_name) LIKE 'gotoTest%'
and join_time > 0 and join_time <= 600000 and join_time is not null
and audio_connect_time >= 0
and (entrypoint_access_time >= 0 or entrypoint_access_time is null)
and (panel_connect_time >= 0 or panel_connect_time is null) and version = 'V2'
在这里,一切正常,但在行中显示错误:
PERCENTILE_DISC( WITHIN GROUP (ORDER BY cast(join_time as double))
over(partition by cast(joinstart_ev_timestamp as date) )/1000 as mini,
它对应的Presto语法是什么?
答案 0 :(得分:0)
如果Presto支持嵌套窗口函数,那么你可以使用NTH_VALUE和p * COUNT(*)OVER(PARTITION BY ...)来找到对应于&#34; p&#39;& #34;窗口中的百分位数。由于Presto不支持此功能,您需要加入一个子查询,而不是计算窗口中的记录数:
SELECT
my_table.window_column,
/* Replace :p with the desired percentile (in your case, 0.02) */
NTH_VALUE(:p*subquery.records_in_window, my_table.ordered_column)
OVER (PARTITION BY my_table.window_column ORDER BY my_table.ordered_column BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM my_table
JOIN (
SELECT
window_column,
COUNT(*) AS records_in_window
FROM my_table
GROUP BY window_column
) subquery ON subquery.window_column = my_table.window_column
以上概念上是接近但失败,因为:p*subquery.records_in_window
是一个浮点数,偏移量需要是一个整数。你有几个方法可以解决这个问题。例如,如果您要查找中位数,则只需舍入到最接近的整数即可。如果你找到了第二个百分点,舍入不会起作用,因为它通常会给你0并且偏移量从1开始。在这种情况下,将天花板四舍五入到最接近的整数可能会更好。
答案 1 :(得分:0)
我正在预先研究中位数,并找到了适合我的解决方案:
例如,我有一个联接表A_join_B,它具有列A_id和B_id。
我想找到与单个B相关的A数的中位数
SELECT APPPROX_PERCENTILE(计数,0.5) 从 ( SELECT COUNT(*)AS计数,narrative_id 来自A_join_B GROUP BY B_id );