我有以下数据:
address|some_mask_value
123 Main | 10100011110
124 Main | 10100011100
我正在使用Apache Pig版本0.15.0.2.4.2.0-258
我正在尝试创建一个指示符,其中第二个到最后一个字符位于' some_mask_value'是一个1.我已经尝试过:
load_data = LOAD '/myfile.txt' USING PigStorage('|') AS (address:String, some_mask_value:String);
grunt> case_test = FOREACH load_data GENERATE (CASE trial
>> WHEN LAST_INDEX_OF(name, '1') 2 THEN yes
>> ELSE no);
2017-04-20 16:59:50,522 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 5, column 30> mismatched input '2' expecting THEN
基本上,如果第二个到最后一个字符是1,那么我稍后会过滤掉该行
答案 0 :(得分:1)
a = load 'data.txt' using PigStorage('|')
as (address: chararray, some_mask_value:chararray);
如果掩码字段是固定长度,就像在样本数据中一样,那么:
b = foreach a generate $0 .. , (
CASE SUBSTRING(some_mask_value, 9, 10)
WHEN '1' THEN 'YES'
ELSE 'NO'
END
) as inidcator;
dump b;
(123 Main,10100011110,YES)
(124 Main,10100011100,NO)
如果掩模不是固定长度:
b = foreach a generate $0 .. , (
CASE SUBSTRING(some_mask_value, (int)SIZE(some_mask_value) - 2, (int)SIZE(some_mask_value) - 1)
WHEN '1' THEN 'YES'
ELSE 'NO'
END
) as indicator;
dump b;
(123 Main,10100011110,YES)
(124 Main,10100011100,NO)
这假设掩码字段不具有前导或尾随空格。