横向扁平化两列具有不同数组长度的雪花

时间:2019-11-29 07:27:07

标签: snowflake-data-warehouse

我是雪花的新手,目前正在学习使用横向压扁。

我目前有一个虚拟表,如下所示: enter image description here

用于“客户编号”和“城市”的数据类型是数组。

我设法理解并应用了Flatten概念以使用以下sql语句爆炸数据:

select c.customer_id, c.last_name, f.value as cust_num, f1.value as city
    from customers as c,
    lateral flatten(input => c.customer_number) f,
    lateral flatten(input => c.cities) f1
    where f.index = f1.index
    order by customer_id;

显示的输出是: enter image description here

正如我们从虚拟表中可以清楚地看到的那样,在第4行中customer_id 104具有3个数字,我想在输出中看到所有这三个数字,如果城市中没有匹配的索引值,我想看看“城市”中的“空”。

我的预期输出是: enter image description here 这可能做到吗?

3 个答案:

答案 0 :(得分:2)

技巧是删除第二个横向元素,并使用第一个元素的索引从第二个数组中选择值:

  select c.customer_id, c.last_name, f.value as cust_num, cites[f.index] as city
    from customers as c,
    lateral flatten(input => c.customer_number) f
    order by customer_id;

答案 1 :(得分:1)

只要您可以确定第二条记录会更短,就可以:

select customer_id, last_name, list1_table.value::varchar as customer_number, 
split(cities,',')[list1_table.index]::varchar as city
from customers, lateral flatten(input=>split(customer_number, ',')) list1_table;

否则,您将必须在两组记录之间进行union(常规的union会消除重复项)

答案 2 :(得分:0)

您可能要使用LEFT OUTER JOIN来完成此任务,但需要首先创建城市的行集版本。

select c.customer_id, c.last_name, f.value as cust_num, f1.value as city
    from customers as c
    cross join lateral flatten(input => c.customer_number) f
    left outer join (select * from customers, lateral flatten(input => cities)) f1
                 on f.index = f1.index
    order by customer_id;