如果我有{',')分隔字段并且有行李数据,如何阅读,请告诉我。我收到了以下错误。
Input Data.
Jorge Posada Yankees|{(Catcher),(Designated_hitter)}|[games#1594,hit_by_pitch#65,grand_slams#7]
Landon Powell Oakland|{(Catcher),(First_baseman)}|[on_base_percentage#0.297,games#26,home_runs#7]
Martin Prado Atlanta|{(Second_baseman),(Infielder),(Left_fielder)},[games#258,hit_by_pitch#3]
bfile= LOAD '/home/cloudera/basketball.txt' using PigStorage('|')as(name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);
grunt> players = load 'basketball.txt' using PigStorage('|')as (name:chararray, team:chararray,position:bag{t:(p:chararray)}, bat:map[]);
2014-11-13 04:49:48,144 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 27, column 117> mismatched input ';' expecting RIGHT_PAREN
Details at logfile: /home/cloudera/pig_1415835089181.log
Sanjeeb
答案 0 :(得分:0)
对于上述输入,不需要正则表达式,您可以使用现有模式本身访问所有值。
<强> input.txt中强>
Jorge Posada |Yankees|{(Catcher),(Designated_hitter)}|[games#1594,hit_by_pitch#65,grand_slams#7]
Landon Powell |Oakland|{(Catcher),(First_baseman)}|[on_base_percentage#0.297,games#26,home_runs#7]
Martin Prado |Atlanta|{(Second_baseman),(Infielder),(Left_fielder)}|[games#258,hit_by_pitch#3]
<强> Pigscript:强>
bfile= LOAD 'input.txt' using PigStorage('|') as (name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);
--Print the name and team
B = FOREACH bfile GENERATE name,team;
--DUMP B;
--Print the player and his position
C = FOREACH bfile GENERATE name,pos.(p);
--DUMP C;
--Print the player and key/value of games and hit_by_pitch
D = FOREACH bfile GENERATE name,bat#'games',bat#'hit_by_pitch';
--DUMP D;
DUMP B的输出:
(Jorge Posada ,Yankees)
(Landon Powell ,Oakland)
(Martin Prado ,Atlanta)
DUMP C的输出:
(Jorge Posada ,{(Catcher),(Designated_hitter)})
(Landon Powell ,{(Catcher),(First_baseman)})
(Martin Prado ,{(Second_baseman),(Infielder),(Left_fielder)})
DUMP D的输出:
(Jorge Posada ,1594,65)
(Landon Powell ,26,)
(Martin Prado ,258,3)
在包中,如果你需要多个字段,那么就像这样声明和访问
pos:bag{t:(p:chararray,q:charrarray)}
FOREACH bfile GENERATE name,pos.(p,q);