Vertica - 是否有LATERAL VIEW功能?

时间:2017-01-17 19:15:27

标签: hive vertica

需要旋转矩阵才能进行TIMESERIES内插/间隙填充,并且希望避免混乱& UNION ALL方法效率低下。在Vertica中有没有像Hive的LATERAL VIEW EXPLODE功能那样的东西?

编辑: @marcothesane - 感谢你有趣的场景 - 我喜欢你的插值方法。我会更多地玩它,看看它是怎么回事。看起来很有希望。

仅供参考 - 这是我提出的解决方案 - 我的方案是我试图通过查询(以及用户/资源池等基本上试图获得成本度量)来查看内存使用情况。我需要进行插值,这样我就可以在任何时间点看到总使用量。所以这是我的查询,它按秒进行时间序列切片,然后聚合以得到" Megabyte_Seconds"分钟。

with qry_cte as
(
select 
session_id
, request_id
, date_trunc('second',start_timestamp) as dat_str
, timestampadd('ss'
    , ceiling(request_duration_ms/1000)::int
    , date_trunc('second',start_timestamp)
    ) as dat_end
, ceiling(request_duration_ms/1000)::int as secs
, memory_acquired_mb
from query_requests
where request_type = 'QUERY'
and request_duration_ms > 0
and memory_acquired_mb > 0
)

select date_trunc('minute',slice_time) as dat_minute
, count(distinct session_id ||  request_id::varchar) as queries
, sum(memory_acquired_mb) as mb_seconds
from (
select session_id, request_id, slice_time, ts_first_value(memory_acquired_mb) as memory_acquired_mb
from (
select session_id, request_id, dat_str as dat, memory_acquired_mb from qry_cte
union all
select session_id, request_id, dat_end as dat, memory_acquired_mb from qry_cte
) x
timeseries slice_time as '1 second' over (partition by session_id, request_id order by dat)
) x
group by 1 order by 1 desc
;

1 个答案:

答案 0 :(得分:2)

我实际上有一个方便的方案可以满足您的要求:

出于这个:

id|day_strt           |sales_01 |sales_02 |sales_03 |sales_04 |sales_05 |sales_06
 1|2016-01-19 08:00:00| 1,842.25| 5,449.40|-        |39,776.86|-        | 9,424.10
 2|2016-01-19 08:00:00|73,810.66|-        | 9,867.70|-        |76,723.91|95,605.14

制作本:

id|day_strt           |sales_01 |sales_02 |sales_03 |sales_04 |sales_05 |sales_06
 1|2016-01-19 08:00:00| 1,842.25| 5,449.40|22,613.13|39,776.86|24,600.48| 9,424.10
 2|2016-01-19 08:00:00|73,810.66|41,839.18| 9,867.70|43,295.81|76,723.91|95,605.14

01到06是指从08:00开始记录销售的第n个小时。

以下是整个场景,包括初始输入数据。

  1. 将输入数据作为SELECT .. UNION ALL SELECT ...。
  2. 一个表格,包括6个整数到CROSS JOIN到1的表格。
  3. 垂直轴:将输入与6个整数交叉连接,并根据索引输出CASE表达式中的第n个销售列。最后,过滤掉相同CASE表达式求值为NULL的位置。
  4. 使用TIMESERIES子句和线性插值填补空白:销售数据以及索引列。
  5. 在最终查询中再次对所有内容进行水平调整。
  6. 比表中所有列的UNION ALL表现更高,我可以向你保证。

    这里是:

    WITH
    -- input 
    input(id,day_strt,sales_01,sales_02,sales_03,sales_04,sales_05,sales_06) AS (
              SELECT 1,'2016-01-19 08:00:00'::TIMESTAMP(0), 1842.25, 5449.40 ,NULL::INT,39776.86 ,NULL::INT, 9424.10
    UNION ALL SELECT 2,'2016-01-19 08:00:00'::TIMESTAMP(0),73810.66 ,NULL::INT, 9867.70 ,NULL::INT,76723.91 ,95605.14
    )
    -- debug
    -- SELECT * FROM input;
    ,
    -- 6 months to pivot vertically -> 6 integers
    six_idxs(idx) AS (
              SELECT 1
    UNION ALL SELECT 2
    UNION ALL SELECT 3
    UNION ALL SELECT 4
    UNION ALL SELECT 5
    UNION ALL SELECT 6
    )
    ,
    -- pivot input vertically and remove rows with null measures
    -- (could probably add the TIMESERIES clause here directly,
    -- but less readable and maintainable)
    vert_pivot AS (
    SELECT
      id
    , idx 
    , TIMESTAMPADD(HOUR,idx-1,day_strt)::TIMESTAMP(0) AS sales_ts
    , CASE idx
        WHEN 1 THEN  sales_01
        WHEN 2 THEN  sales_02
        WHEN 3 THEN  sales_03
        WHEN 4 THEN  sales_04
        WHEN 5 THEN  sales_05
        WHEN 6 THEN  sales_06
      END AS sales
    FROM input
    CROSS JOIN six_idxs
    WHERE (
        CASE idx
          WHEN 1 THEN  sales_01
          WHEN 2 THEN  sales_02
          WHEN 3 THEN  sales_03
          WHEN 4 THEN  sales_04
          WHEN 5 THEN  sales_05
          WHEN 6 THEN  sales_06
        END
      ) IS NOT NULL
    )
    -- debug:
    -- SELECT * FROM vert_pivot;
    ,
    -- gap filling and interpolation
    gaps_filled AS (
    SELECT
      id
    , TS_FIRST_VALUE(idx,'LINEAR')   AS idx
    , tm_sales_ts::TIMESTAMP(0) AS sales_ts
    , TS_FIRST_VALUE(sales,'LINEAR') AS sales
    FROM vert_pivot
    TIMESERIES tm_sales_ts AS '1 HOUR' OVER(
      PARTITION BY id ORDER BY sales_ts
      )
    )
    -- debug
    -- SELECT * FROM gaps_filled ORDER BY 1,2;
    -- pivot horizontally; final query
    SELECT
      id
    , MIN(sales_ts) AS day_strt
    , SUM(CASE idx WHEN 1 THEN sales END)::NUMERIC(7,2) AS sales_01
    , SUM(CASE idx WHEN 2 THEN sales END)::NUMERIC(7,2) AS sales_02
    , SUM(CASE idx WHEN 3 THEN sales END)::NUMERIC(7,2) AS sales_03
    , SUM(CASE idx WHEN 4 THEN sales END)::NUMERIC(7,2) AS sales_04
    , SUM(CASE idx WHEN 5 THEN sales END)::NUMERIC(7,2) AS sales_05
    , SUM(CASE idx WHEN 6 THEN sales END)::NUMERIC(7,2) AS sales_06
    FROM gaps_filled
    GROUP BY id
    ORDER BY id
    ;
    

    开心玩 -

    Marco the Sane