我使用了以下命令
X1 = LOAD '/PIG10/' using PigStorage(',') as (statename:chararray,district:chararray,code:chararray,ru:chararray);
Y1 = LOAD '/POP2/' using PigStorage(',') as (district:chararray,r_u:chararray);
我在X1中有四列数据
(JAMMU & KASHMIR JAMMU & KASHMIR 00000 Total,,,)
(JAMMU & KASHMIR JAMMU & KASHMIR 00000 Rural,,,)
(JAMMU & KASHMIR JAMMU & KASHMIR 00000 Urban,,,)
(JAMMU & KASHMIR Kupwara 00000 Total,,,)
(JAMMU & KASHMIR Kupwara 00000 Rural,,,)
(JAMMU & KASHMIR Kupwara 00000 Urban,,,)
(JAMMU & KASHMIR Badgam 00000 Total,,,)
(JAMMU & KASHMIR Badgam 00000 Rural,,,)
(JAMMU & KASHMIR Badgam 00000 Urban,,,)
(JAMMU & KASHMIR Leh(Ladakh) 00000 Total,,,)
(JAMMU & KASHMIR Leh(Ladakh) 00000 Rural,,,)
(JAMMU & KASHMIR Leh(Ladakh) 00000 Urban,,,)
(JAMMU & KASHMIR Kargil 00000 Total,,,)
(JAMMU & KASHMIR Kargil 00000 Rural,,,)
(JAMMU & KASHMIR Kargil 00000 Urban,,,)
(JAMMU & KASHMIR Punch 00000 Total,,,)
(JAMMU & KASHMIR Punch 00000 Rural,,,)
在Y1中如下
(JAMMU & KASHMIR Total,)
(JAMMU & KASHMIR Rural,)
(JAMMU & KASHMIR Urban,)
(Kupwara Total,)
(Kupwara Rural,)
(Kupwara Urban,)
(Badgam Total,)
(Badgam Rural,)
(Badgam Urban,)
(Leh(Ladakh) Total,)
(Leh(Ladakh) Rural,)
(Leh(Ladakh) Urban,)
(Kargil Total,)
(Kargil Rural,)
(Kargil Urban,)
(Punch Total,)
(Punch Rural,)
(Punch Urban,)
(Rajouri Total,)
(Rajouri Rural,)
(Rajouri Urban,)
我使用了join C2 =按地区加入X1,按地区加入Y1; 但我无法得到输出
答案 0 :(得分:1)
原因是,所有输入都被加载到第一列,而X1中的剩余3列(区,代码,ru)和Y1中的1列(r_u)为空。 它看起来像分隔符','不适合您的输入数据。你能粘贴文件PIG10和POP2的实际输入格式吗?
Solution:
Try this script, the below regex is written based on the above input only.
X = LOAD '/PIG10/' AS line;
Y = LOAD '/POP2/' AS line1;
X1 = FOREACH X GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '(\\w+|\\w+\\s+&\\s+\\w+)\\s+([a-zA-Z()]+|\\w+\\s+&\\s+\\w+)\\s+(\\w+)\\s+(\\w+)')) AS (statename:chararray,district:chararray,code:chararray,ru:chararray);
Y1 = FOREACH Y GENERATE FLATTEN(REGEX_EXTRACT_ALL(line1, '([a-zA-Z()]+|\\w+\\s+&\\s+\\w+)\\s+(\\w+)')) AS (district:chararray,r_u:chararray);
C2 = join X1 by district,Y1 by district;
DUMP C2;
Sample output:
(JAMMU & KASHMIR,Punch,00000,Total,Punch,Rural)
(JAMMU & KASHMIR,Punch,00000,Total,Punch,Urban)
(JAMMU & KASHMIR,Badgam,00000,Urban,Badgam,Rural)
(JAMMU & KASHMIR,Badgam,00000,Urban,Badgam,Total)
(JAMMU & KASHMIR,Badgam,00000,Urban,Badgam,Urban)
(JAMMU & KASHMIR,Leh(Ladakh),00000,Urban,Leh(Ladakh),Rural)
(JAMMU & KASHMIR,Leh(Ladakh),00000,Urban,Leh(Ladakh),Total)
(JAMMU & KASHMIR,Leh(Ladakh),00000,Urban,Leh(Ladakh),Urban)
(JAMMU & KASHMIR,JAMMU & KASHMIR,00000,Rural,JAMMU & KASHMIR,Urban)
(JAMMU & KASHMIR,JAMMU & KASHMIR,00000,Rural,JAMMU & KASHMIR,Rural)