从猪拉丁文件中读取元组

时间:2020-01-06 19:56:22

标签: apache-pig

这是https://pig.apache.org/docs/r0.17.0/basic.html

中的示例
cat data;
(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)

 A = LOAD 'data' AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));

 DUMP A;
 ((3,8,9),(4,5,6))
 ((1,4,7),(3,7,5))
 ((2,5,8),(9,5,8))

我在maria_dev中创建了一个tp.txt,该日期具有相同的日期(即

(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)

) 并阅读:

tp = LOAD 'tp.txt' as (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));           

但是当我运行DUMP X时,得到以下输出:

((3,8,9),)
((1,4,7),)
((2,5,8),)

我在做什么错了?

1 个答案:

答案 0 :(得分:0)

默认情况下,load语句假定您的字段用制表符分隔。您似乎在文本文件中使用空格。无需更改文件,您可以执行以下操作:

tp = LOAD 'tp.txt' USING PigStorage(' ') AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));

或者您可以使用制表符替换文本文件中的空格,并保持您的load语句不变。