将分隔符拆分为独特的HIV排放列

时间:2016-11-17 15:52:18

标签: hive

我有一个数据集。请参阅下面的示例行:

94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507; 1460777656:440515; 1460778054:440488; 1460778157:440481,440600;

每列用空格分隔(总共3列)。列名是id(int),unid(string),time_stamp(string)。

我想分割数据集,使每个唯一元素如下面的行: -

  • 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507
  • 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460777656:440515
  • 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488
  • 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481
  • 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440600

每个子点都是每一行。我使用了以下查询,但它给了我上面的输出。我使用了以下代码,但它不起作用: -

选择id,unid,time_date 从表 LATERAL VIEW爆炸(SPLIT(time_date,' \;'))time_date为time_date;

输出: - 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507; 1460777656:440515; 1460778054:440488; 1460778157:440481,440600; (以下行重复5次)

帮助将不胜感激!在此先感谢:)

1 个答案:

答案 0 :(得分:1)

首先,我不得不用管道替换分号。所以:

CREATE temporary TABLE tbl 
(id int,
unid string,
time_stamp string);

INSERT INTO tbl 
VALUES (
94654, '6802D326-9F9B-4FC8-B2DD-F878EADE31F2' , '1460695483:440507|1460777656:440515|1460778054:440488|1460778157:440481,440600');

SELECT
id,
unid,
time_stamp
FROM
(
SELECT
id,
unid,
split(time_stamp,'\\|')  ts
FROM
tbl
) t
lateral VIEW explode(t.ts) bar AS time_stamp;

哪位给我们:

94654   6802D326-9F9B-4FC8-B2DD-F878EADE31F2    1460695483:440507
94654   6802D326-9F9B-4FC8-B2DD-F878EADE31F2    1460777656:440515
94654   6802D326-9F9B-4FC8-B2DD-F878EADE31F2    1460778054:440488
94654   6802D326-9F9B-4FC8-B2DD-F878EADE31F2    1460778157:440481,440600

你必须在不同的步骤中进行拆分和爆炸。因此,我们在派生表中进行拆分,在外部查询中进行爆炸/横向视图。