在猪

时间:2015-04-28 05:19:53

标签: regex apache-pig

我想拆分区域转换的字符串。我有这样的数据。

(149Sq.Yards)
(151Sq.Yards)
(190Sq.Yards)
(190Sq.Yards)

我想像这样拆分上述数据。

149  sq.yards
151  sq.yards

我尝试了以下代码。

a = LOAD '/user/ahmedabad/Makkan_PropertyDetails_Apartment_Ahmedabad.csv' using PigStorage('\t') as (SourceWebSite:chararray,PropertyID:chararray,ListedOn:chararray,ContactName:chararray,TotalViews:int,Price:chararray,PriceperArea:chararray,NoOfBedRooms:int,NoOfBathRooms:int,FloorNoOfProperty:chararray,TotalFloors:int,Possession:chararray,BuiltUpArea:chararray,Furnished:chararray,Ownership:chararray,NewResale:chararray,Facing:chararray,title:chararray,PropertyAddress:chararray,NearByFacilities:chararray,PropertyFeatures:chararray,Sellerinfo:chararray,Description:chararray);
b = FOREACH a GENERATE BuiltUpArea; 
c = FILTER b BY (BuiltUpArea matches '.*Sq.Yards.*');
d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(.*)', 1) * 9;   

while dump d .it打印为null。

1 个答案:

答案 0 :(得分:0)

您提到的正则表达式将匹配所有字符,因此它会尝试像(149Sq.Yards * 9)那样相乘。这是输出中null的原因。

以下正则表达式会单独从输入中分割数字,并像(149 * 9)那样相乘。

d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(^[0-9]+)', 1) * 9;
dump d;