Regex to extract first part of string in Apache Pig

时间:2015-04-23 05:31:14

标签: apache-pig

I need to extract post code district from the input data below

AB55 4
DD7 6LL
DD5 2HI

My Code

A = load 'data' as postcode:chararray;
B = foreach A {
code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1);
generate code_district;
};
dump B;

Output should look like

AB55
DD7
DD5

what should be the regular expression to extract the first part of the string?

1 个答案:

答案 0 :(得分:1)

你能试试下面的Regex吗?

<强>选项1:

A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1);
DUMP code_district;

<强>选项2:

A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;

<强>输出:

(AB55)
(DD7)
(DD5)