我要求将end_dt设置为下一个记录effective_dt减去给定ID的1天,并将其默认为9999-12-31,以获取猪的给定id的最后一个记录。
输入数据 -
id eff_dt end_dt
1 2012-02-28 9999-12-31
1 2013-03-15 9999-12-31
1 2014-05-01 9999-12-31
必需结果 - (按eff_dt排序,然后获取下一条记录)
id eff_dt end_dt
1 2012-02-28 2013-02-14
1 2013-03-15 2014-04-30
1 2014-05-01 9999-12-31
我是apache PIG的新手,发现我们可以使用超前/滞后,缝合/展平但不知道如何在脚本中使用它来实现上述结果。我面临的问题很少。
Issue 1 :- PIG accepts date as chararray. Need to convert eff_dt into date.
Issue 2 :- want to know syntax for 'date minus 1 day'.
Issue 3 :- How to use lead lag to get next record and do a minus one day and default if there is no next record.
从apache pig网站获得以下示例代码,但没有得到如何转换它以在我的用例中使用它。: -
要在当前记录之前找到记录3,请使用当前行和前面3条记录之间的窗口,并使用默认值0。
A = load 'T';
B = group A by si;
C = foreach B {
C1 = order A by i;
generate flatten(Stitch(C1, Over(C1.i, 'lead', 0, 3, 3, 0)));
}
D = foreach C generate s, $9;
这相当于SQL语句
在T上选择s,引导(i,3,0)(按顺序除以当前行和后续3行之间的i行);
任何帮助将不胜感激。
答案 0 :(得分:0)
你有3个问题,我现在只能回答前两个问题:
如何将yyyy-mm-dd转换为日期并减去一天:
-- first load the piggybank and define shorthand to Over and Stitch functions
REGISTER '/data/lib/piggybank-0.12.0.jar';
DEFINE Over org.apache.pig.piggybank.evaluation.Over();
DEFINE Stitch org.apache.pig.piggybank.evaluation.Stitch();
-- load the input data
data = LOAD '/data' USING PigStorage('\t') AS (id:int, eff_dt:chararray);
-- generate the previous date (that could be done later)
data_before = FOREACH data {
date = ToDate(eff_dt, 'yyyy-MM-dd');
dayBefore = SubtractDuration(date, 'P1D');
eff_before = ToString(dayBefore, 'yyyy-MM-dd');
GENERATE id as id, eff_dt as eff_dt, eff_before as eff_before;
}
-- Stitch join two bags based on position
-- Over apply a function on a group. Here we use the lead operator to get the next tuple
data_over = FOREACH (GROUP data_before ALL) {
out = Stitch(data_before, Over(data_before.eff_before, 'lead', 0, 1, 1, '9999-99-99'));
GENERATE FLATTEN(out) as (id, eff_dt, eff_before, end_dt);
}
-- finally, we output (we could have transform the date here)
data_final = FOREACH data_over GENERATE id, eff_dt, end_dt;
我终于有机会尝试了皮卡的Over和Stich方法。这是一个有效的解决方案。
(1,2012-02-28,2013-03-14)
(1,2013-03-15,2014-04-30)
(1,2014-05-01,9999-99-99)
此脚本的输出为:
public String GetWeek()
{
var datetime = DateTime.Now;
var cultureInfo = new CultureInfo("da-DK");
var calendar = cultureInfo.Calendar;
var week = calendar.GetWeekOfYear(datetime, cultureInfo.DateTimeFormat.CalendarWeekRule, cultureInfo.DateTimeFormat.FirstDayOfWeek);
return datetime.Year + "-W" + week;
}