我有一张桌子,如下:
user_id email
u1 e1, e2
u2 null
我的目标是将其转换为以下格式:
user_id email
u1 e1
u1 e2
u2 null
Hive sql:选择* FROM表横向视图爆炸(拆分(email(',',')))电子邮件AS email_id
当在蜂巢中执行上述查询时,我得到的是空值,但是当在spark-sql中运行相同的查询时,我没有得到的空值,这个问题和场景已经讨论过here
火花sql
:
select * FROM table LATERAL VIEW OUTER explode ( split ( email ,',' ) ) email AS email_id;
select * from table lateral view POSEXPLODE_OUTER(split(email,',')) email as email_id <br>
第二个语法错误,我尝试使用posexplode_outer搜索侧面视图,但结果不多,我想在spark-sql中添加null。
答案 0 :(得分:2)
Spark SQL不使用HiveQL。它与它部分兼容,但不要误解。除了使用 AtomicInteger index = new AtomicInteger(1);
Map<String, String> result1 = batchList.stream()
.collect(Collectors
.toMap(ignored -> "providerId" + index.getAndIncrement(), PrimaryCareDTO::getProviderId)
);
index.set(1);
Map<String, String> result2 = batchList.stream()
.collect(Collectors
.toMap(ignored -> "locatorCode" + index.getAndIncrement(), PrimaryCareDTO::getLocatorCode)
);
Map<String, String> result = new HashMap<>();
result.putAll(result1);
result.putAll(result2);
之外,您还应该使用LATERAL VIEW
SELECT
答案 1 :(得分:1)
拆分后添加coalesce
似乎可行
with tmp_table as (
select 'u1' as user, 'e1,e2' as email
union all
select 'u2' as user, NULL as email
)
select * FROM tmp_table
LATERAL VIEW explode ( coalesce(split ( email ,',' ), array(NULL)) ) email AS email_id;
输出
u1 e1,e2 e1
u1 e1,e2 e2
u2 NULL NULL
答案 2 :(得分:0)
Spark 2.2.0中添加了“横向视图外部”
例如
scala> spark.sql(
| "select * FROM table LATERAL VIEW OUTER explode ( split ( email ,',' ) ) email AS email_id"
| ).show
+-------+------+--------+
|user_id| email|email_id|
+-------+------+--------+
| u1|e1, e2| e1|
| u1|e1, e2| e2|
| u2| null| null|
+-------+------+--------+