在icCube Builder ETL中,我想将数据分组到多个字段上。另外,作为聚合函数,我想使用MAX和MIN。
示例数据:
(文中相同的数据)
groupId phase startDate endDate
100 start 1-May-2018 5-May-2018
100 start 4-May-2018 7-May-2018
100 start 28-Apr-2018 1-May-2018
100 middle 4-May-2018 11-May-2018
100 middle 1-May-2018 10-May-2018
100 end 12-May-2018 15-May-2018
100 end 11-May-2018 13-May-2018
100 end 13-May-2018 14-May-2018
100 end 9-May-2018 12-May-2018
200 start 4-Apr-2018 2-May-2018
200 middle 18-Apr-2018 3-May-2018
200 middle 1-May-2018 1-May-2018
300 end 21-Apr-2018 24-Apr-2018
我想将这些数据分组到groupId和phase上,并获得最小的startDate和最大的endDate:
如何在icCube ETL中做到最好?
答案 0 :(得分:1)
我们在ETL层中添加了新版本的groupBy View以支持此功能。但是,您可以创建Java视图来执行groupBy。
类似的东西:
package iccube.pub;
import java.util.*;
import java.lang.*;
import org.joda.time.*;
import crazydev.iccube.pub.view.*;
public class CustomJavaView implements IOlapBuilderViewLogic
{
private Map<List<Comparable>,List<Agg>> cached;
public CustomJavaView()
{
}
public void onInitMainTable(Map<String, IOlapCachedTable> cachedTables, IOlapDataTableDef mainTable)
{
cached = new HashMap();
}
public boolean onNewRow(IOlapViewContext context, Map<String, IOlapCachedTable> cachedTables, IOlapDataTableDef mainTable, IOlapReadOnlyDataRow mainTableRow)
{
// create the groupby key (list of values)
final List<Comparable> groupBy = Arrays.asList(mainTableRow.get("phase"), mainTableRow.get("groupId"));
// get the aggregators for values for the keys, build them if not already there
final List<Agg> aggs = cached.computeIfAbsent(groupBy, key -> Arrays.asList(new Agg(true), new Agg(false)));
// add values
aggs.get(0).add(mainTableRow.getAsDateTime("startDate"));
aggs.get(1).add(mainTableRow.getAsDateTime("endDate"));
return true; // false to stop
}
public void onProcessingCompleted(IOlapViewContext context, Map<String, IOlapCachedTable> cachedTables)
{
// now we can fire rows
for (Map.Entry<List<Comparable>, List<Agg>> entry : cached.entrySet())
{
final List<Comparable> groupByKey = entry.getKey();
final List<Agg> aggs = entry.getValue();
// create empty row
final IOlapDataTableRow row = context.newRow();
row.set("phase",groupByKey.get(0));
row.set("groupId",groupByKey.get(1));
row.set("startDate",aggs.get(0).date);
row.set("endDate",aggs.get(1).date);
context.fireRow(row);
}
}
// this is the Aggregator, you could implement something more complicated
static class Agg
{
final int isMin;
LocalDateTime date;
Agg(boolean isMin)
{
this.isMin = isMin ? -1 : 1;
}
void add(LocalDateTime ndate)
{
if (ndate != null)
{
date = ( date!= null && ((date.compareTo(ndate) * isMin) > 0)) ? date : ndate;
}
}
}
}