我有以下数据,其中id
是一个整数,vectors
是一个数组:
id, vectors
1, [1,2,3]
2, [2,3,4]
3, [3,4,5]
我想以其索引位置爆炸vectors
列,使其看起来像这样:
+---+-----+------+
|id |index|vector|
+---+-----+------+
|1 |0 |1 |
|1 |1 |2 |
|1 |2 |3 |
|2 |0 |2 |
|2 |1 |3 |
|2 |2 |4 |
|3 |0 |3 |
|3 |1 |4 |
|3 |2 |5 |
+---+-----+------+
我认为可以使用selectExpr
df.selectExpr("*", "posexplode(vectors) as (index, vector)")
但是,这是一个相对简单的任务,我想避免编写ETL脚本,并一直在考虑是否可以使用该表达式并创建一个视图以方便通过Presto进行访问。
答案 0 :(得分:4)
在Presto中,使用带有UNNEST
的标准SQL语法很容易做到这一点:
WITH data(id, vector) AS (
VALUES
(1, array[1,2,3]),
(2, array[2,3,4]),
(3, array[3,4,5])
)
SELECT id, index - 1 AS index, value
FROM data, UNNEST(vector) WITH ORDINALITY AS t(value, index)
请注意,WITH ORDINALITY
产生的索引是从1开始的,因此我从中减去1来产生您的问题中包含的输出。
答案 1 :(得分:0)
您可以使用Lateral view
中的Hive
来explode
数组数据。
尝试以下查询-
select
id, (row_number() over (partition by id order by col)) -1 as `index`, col as vector
from (
select 1 as id, array(1,2,3) as vectors from (select '1') t1 union all
select 2 as id, array(2,3,4) as vectors from (select '1') t2 union all
select 3 as id, array(3,4,5) as vectors from (select '1') t3
) t
LATERAL VIEW explode(vectors) v;