如何在配置单元中的管道定界符后获取第N个字符串

时间:2019-06-26 16:32:02

标签: hive

我在Hive中有一张表格,我想从其中的一个列中提取字符串的第5个部分-

样本数据

john:12|doe|google|usa|google.com|newspaper - title - 1 - volume - 1234|360671191
john:34|doe|fb|usa|google.com|newspaper - title - X - volume - 1233|360671192
john:45|doe|twitter|usa|google.com|newspaper - title - Y - volume - 1232|360671193
jane:45:1323

我想解析第一个竖线字符(|)之后的第5个字符串。输出列的值为-

newspaper - title - 1 - volume - 1234
newspaper - title - X - volume - 1233
newspaper - title - Y - volume - 1232
jane:45:1323

如果标题不存在(如记录4中所示),那么我们将按原样返回原始字符串。

1 个答案:

答案 0 :(得分:0)

使用拆分功能,如下所示:

with your_data as (
select stack(4,
'john:12|doe|google|usa|google.com|newspaper - title - 1 - volume - 1234|360671191',
'john:34|doe|fb|usa|google.com|newspaper - title - X - volume - 1233|360671192',
'john:45|doe|twitter|usa|google.com|newspaper - title - Y - volume - 1232|360671193',
'jane:45:1323'
) as str
)

select nvl(splitted_str[5], original_str) result
 from
(
select split(str,'\\|') splitted_str, str original_str 
  from your_data
)s;

返回:

newspaper - title - 1 - volume - 1234   
newspaper - title - X - volume - 1233   
newspaper - title - Y - volume - 1232   
jane:45:1323