无法从猪脚本中读取数据/

时间:2014-11-13 13:10:25

标签: apache-pig

如果我有{',')分隔字段并且有行李数据,如何阅读,请告诉我。我收到了以下错误。

Input Data.
Jorge Posada Yankees|{(Catcher),(Designated_hitter)}|[games#1594,hit_by_pitch#65,grand_slams#7]
Landon Powell Oakland|{(Catcher),(First_baseman)}|[on_base_percentage#0.297,games#26,home_runs#7]
Martin Prado Atlanta|{(Second_baseman),(Infielder),(Left_fielder)},[games#258,hit_by_pitch#3]

bfile= LOAD '/home/cloudera/basketball.txt' using PigStorage('|')as(name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);

grunt> players = load 'basketball.txt' using PigStorage('|')as (name:chararray, team:chararray,position:bag{t:(p:chararray)}, bat:map[]);
2014-11-13 04:49:48,144 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 27, column 117>  mismatched input ';' expecting RIGHT_PAREN
Details at logfile: /home/cloudera/pig_1415835089181.log

Sanjeeb

1 个答案:

答案 0 :(得分:0)

对于上述输入,不需要正则表达式,您可以使用现有模式本身访问所有值。

<强> input.txt中

Jorge Posada |Yankees|{(Catcher),(Designated_hitter)}|[games#1594,hit_by_pitch#65,grand_slams#7]
Landon Powell |Oakland|{(Catcher),(First_baseman)}|[on_base_percentage#0.297,games#26,home_runs#7]
Martin Prado |Atlanta|{(Second_baseman),(Infielder),(Left_fielder)}|[games#258,hit_by_pitch#3]

<强> Pigscript:

bfile= LOAD 'input.txt' using PigStorage('|') as (name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);

--Print the name and team
B = FOREACH bfile GENERATE name,team;
--DUMP B;

--Print the player and his position
C = FOREACH bfile GENERATE name,pos.(p);
--DUMP C;

--Print the player and  key/value of games and hit_by_pitch
D = FOREACH bfile GENERATE name,bat#'games',bat#'hit_by_pitch';
--DUMP D;

DUMP B的输出:

(Jorge Posada ,Yankees)
(Landon Powell ,Oakland)
(Martin Prado ,Atlanta)

DUMP C的输出:

(Jorge Posada ,{(Catcher),(Designated_hitter)})
(Landon Powell ,{(Catcher),(First_baseman)})
(Martin Prado ,{(Second_baseman),(Infielder),(Left_fielder)})

DUMP D的输出:

(Jorge Posada ,1594,65)
(Landon Powell ,26,)
(Martin Prado ,258,3)

在包中,如果你需要多个字段,那么就像这样声明和访问

pos:bag{t:(p:chararray,q:charrarray)}
FOREACH bfile GENERATE name,pos.(p,q);