我在Hive中有一张表格,我想从其中的一个列中提取字符串的第5个部分-
样本数据
john:12|doe|google|usa|google.com|newspaper - title - 1 - volume - 1234|360671191
john:34|doe|fb|usa|google.com|newspaper - title - X - volume - 1233|360671192
john:45|doe|twitter|usa|google.com|newspaper - title - Y - volume - 1232|360671193
jane:45:1323
我想解析第一个竖线字符(|)之后的第5个字符串。输出列的值为-
newspaper - title - 1 - volume - 1234
newspaper - title - X - volume - 1233
newspaper - title - Y - volume - 1232
jane:45:1323
如果标题不存在(如记录4中所示),那么我们将按原样返回原始字符串。
答案 0 :(得分:0)
使用拆分功能,如下所示:
with your_data as (
select stack(4,
'john:12|doe|google|usa|google.com|newspaper - title - 1 - volume - 1234|360671191',
'john:34|doe|fb|usa|google.com|newspaper - title - X - volume - 1233|360671192',
'john:45|doe|twitter|usa|google.com|newspaper - title - Y - volume - 1232|360671193',
'jane:45:1323'
) as str
)
select nvl(splitted_str[5], original_str) result
from
(
select split(str,'\\|') splitted_str, str original_str
from your_data
)s;
返回:
newspaper - title - 1 - volume - 1234
newspaper - title - X - volume - 1233
newspaper - title - Y - volume - 1232
jane:45:1323