我有一个包含两组数据的文件,如下所示:
1,abc,10,dss
2,efgh,as
1,abc,10,1234
2,efgh,as
1,abc,10,7899
2,efgh,as
以#1开头的记录是一组,以#2开头的记录是不同的集合。所以两者都有不同的结构。如何分开这两组记录?
答案 0 :(得分:0)
这是一种方式......
A = LOAD '/user/data/split.txt' as line:chararray;
B = FOREACH A GENERATE Flatten(TOKENIZE(line,' ')) ;
B1 = filter B by $0 matches '1.*';
B2 = filter B by $0 matches '2.*';
DUMP B1
DUMP B2
or
SPLIT B INTO B1 IF ($0 matches '1.*'), B2 IF ($0 matches '2.*');
答案 1 :(得分:0)
使用新的更新版本的输入,这是其他解决方案
A = LOAD '/user/data/split.txt' as line:chararray;
B1 = filter A by $0 matches '1.*';
B2 = filter A by $0 matches '2.*';
or
SPLIT A INTO B1 IF ($0 matches '1.*'), B2 IF ($0 matches '2.*');