我是猪的新手。这是我想要完成的一些伪代码:
FOREACH split_records {
UPDATE updated_volume SET
open=updated_volume.open*split_records.multiply_by/split_records.divide_by,
close=updated_volume.close*split_records.multiply_by/split_records.divide_by
WHERE split_records.symbol=updated_volume.symbol AND
updated_volume.date < split_records.split_date
}
到目前为止,这就是猪代码:
FOREACH split_records {
SPLIT updated_volume INTO split_yes IF updated_volume.symbol==split_records.symbol AND
updated_volume.date < split_records.splitDate, split_no IF
updated_volume.symbol!=split_records.symbol OR
updated_volume.date > split_Records.splitDate;
updated_splits = FOREACH split_yes GENERATE
symbol,
date,
(split_yes.open*split_records.multiply_by/split_records.divide_by) AS open,
(split_yes.close*split_records.multiply_by/split_records.divide_by) AS close;
updated_volume = UNION updated_splits, split_no;
};
上面的代码给出了错误:不匹配的输入'SPLIT'期待GENERATE,所以它绝对不会起作用。但基本上我试图模拟一个“UPDATE..WHERE”操作,其中WHERE条件依赖于一个变量,该变量是迭代另一组记录的结果,其长度/计数未知。
我的模糊印象是Pig不是那种用于迭代的语言,所以我对任何可以实现此目的的方法持开放态度。
答案 0 :(得分:1)
我认为这段代码与你要做的事情有关。 对于updated_volume中的每条记录,它将应用所有后期更新的相应split_records。
cgrp = COGROUP updated_volume BY symbol, split_records BY symbol;
SPLIT cgrp INTO
did_split IF SIZE(split_records) > 0,
did_not_split OTHERWISE;
-- reflatten data for symbols that did not split
not_updated = FOREACH did_not_split GENERATE
FLATTEN(updated_volume);
-- update data for symbols that did split
to_be_updated = FOREACH did_split GENERATE
FLATTEN(updated_volume) AS (symbol, volume_date, open, close),
split_records;
updated = FOREACH to_be_updated {
applicable_splits = FILTER split_records BY date >= volume_date;
GENERATE
symbol, volume_date AS date,
-- NOTE: you would have to write a quick udf
-- in jython (or java) to calculate the product
-- of a bag of numbers
open * my_udfs.product(split_records.multiply_by) / PRODUCT(split_records.divide_by)
AS open,
close * my_udfs.product(split_records.multiply_by) / PRODUCT(split_records.divide_by)
AS close;
}
updated_volume = UNION updated, not_updated;
答案 1 :(得分:0)
你可以使用(条件)?列打开和关闭时为true:false 。