Spark:如何在横向视图中包含空行爆炸

时间:2018-10-12 14:38:45

标签: apache-spark hive apache-spark-sql hiveql

我有一张桌子,如下:

user_id email
u1      e1, e2
u2      null

我的目标是将其转换为以下格式:

user_id email
u1      e1
u1      e2
u2      null

Hive sql:选择* FROM表横向视图爆炸(拆分(email(',',')))电子邮件AS email_id

当在蜂巢中执行上述查询时,我得到的是空值,但是当在spark-sql中运行相同的查询时,我没有得到的空值,这个问题和场景已经讨论过here

火花sql

  1. select * FROM table LATERAL VIEW OUTER explode ( split ( email ,',' ) ) email AS email_id;
  2. select * from table lateral view POSEXPLODE_OUTER(split(email,',')) email as email_id <br>

第二个语法错误,我尝试使用posexplode_outer搜索侧面视图,但结果不多,我想在spark-sql中添加null。

3 个答案:

答案 0 :(得分:2)

Spark SQL不使用HiveQL。它与它部分兼容,但不要误解。除了使用 AtomicInteger index = new AtomicInteger(1); Map<String, String> result1 = batchList.stream() .collect(Collectors .toMap(ignored -> "providerId" + index.getAndIncrement(), PrimaryCareDTO::getProviderId) ); index.set(1); Map<String, String> result2 = batchList.stream() .collect(Collectors .toMap(ignored -> "locatorCode" + index.getAndIncrement(), PrimaryCareDTO::getLocatorCode) ); Map<String, String> result = new HashMap<>(); result.putAll(result1); result.putAll(result2); 之外,您还应该使用LATERAL VIEW

SELECT

答案 1 :(得分:1)

拆分后添加coalesce似乎可行

with tmp_table as ( 
  select 'u1' as user, 'e1,e2' as email 
  union all 
  select 'u2' as user, NULL as email
)
select * FROM tmp_table 
LATERAL VIEW explode ( coalesce(split ( email ,',' ), array(NULL)) ) email AS email_id;

输出

u1  e1,e2   e1
u1  e1,e2   e2
u2  NULL    NULL

答案 2 :(得分:0)

Spark 2.2.0中添加了“横向视图外部”

例如

scala> spark.sql( | "select * FROM table LATERAL VIEW OUTER explode ( split ( email ,',' ) ) email AS email_id" | ).show +-------+------+--------+
|user_id| email|email_id| +-------+------+--------+ | u1|e1, e2| e1| | u1|e1, e2| e2| | u2| null| null| +-------+------+--------+