我正在尝试根据第一个架构中的ID从架构中过滤掉一组描述。
我是猪的新手,所以很难掌握这一点。
以下是我构建的代码无效:
changeReason = LOAD 'Change_Reason.txt' USING org.apache.pig.piggybank.storage.CSVExcelStorage('|', 'NO_MULTILINE', 'UNIX', 'SKIP_INPUT_HEADER')
AS (changeReasonID:int, reasonName:chararray);
price = LOAD '$directory/Price.txt' USING org.apache.pig.piggybank.storage.CSVExcelStorage('|', 'NO_MULTILINE', 'UNIX', 'SKIP_INPUT_HEADER')
AS (priceID:int, changeReasonID:int);
priceChangeReasonIDs = GROUP price BY changeReasonID;
subGroup = FOREACH priceChangeReasonIDs
{
change = FILTER changeReason BY changeReasonID == group.changeReasonId;
GENERATE group AS changeID, change.reasonName AS Reason;
};
该代码给出了以下错误:
Failed to parse: Pig script failed to parse:
<file load_historical_price.pig, line 108, column 20> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)
答案 0 :(得分:0)
这个工作示例可以帮助:
如果我理解你,你想在group元素上过滤多组数据。
所以这是我的示例脚本:
data = LOAD 'SO/data.txt' USING PigStorage(' ') AS (val:int, id1:chararray, id2:int);
DESCRIBE data;
dgroup = GROUP data BY (id1, id2);
DESCRIBE dgroup;
dfilter = FILTER dgroup BY group.id1 == 'B';
DESCRIBE dfilter;
DUMP dfilter;
按id1过滤分组的(id1,id2)数据。
示例输入:
12 A 1
22 A 2
32 B 1
33 B 2
43 B 1
55 A 2
77 B 2
88 A 1
DUMP的结果:
((B,1),{(43,B,1),(32,B,1)})
((B,2),{(77,B,2),(33,B,2)})
这是你想做的事情吗?