假设我的数据如下:
Acct_id | amount
--------|-------
10001 |6.00
20000 |5.00
32356 |1.00
10001 |2.00
45000 |1.50
45000 |10.00
我的预期结果应该是这样的:
acct_id| count
-------|-----
10001 | 2
45000 | 2
我如何在cassandra中获得它?
答案 0 :(得分:0)
我如何在cassandra中获得它?
如果您使用 Cassandra 2.2.x 或 3.x ,则可以创建用户定义的汇总
CREATE FUNCTION counByAccId(state map<int, int>, acctid int)
RETURNS NULL ON NULL INPUT
RETURNS map<int, int>
LANGUAGE java
AS '
if(state.containsKey(acctid)) {
Integer currentCount = (Integer)state.get(acctid);
state.put(acctid, currentCount + 1);
} else {
state.put(acctid, 1);
}
return state;
';
CREATE AGGREGATE groupByAcctIdAndCount(int)
SFUNC counByAccId
STYPE map<int, int>
INITCOND {};
SELECT groupByAcctIdAndCount(acct_id) FROM myTable WHERE partition_key = xxx;
示例数据集:
select * from agg;
partition_key | acct_id | val
---------------+---------+-----
5 | 45000 | 1.5
1 | 10001 | 6
2 | 20000 | 5
4 | 10001 | 2
6 | 45000 | 10
3 | 32356 | 1
select groupByAcctIdAndCount(acctid) FROM agg;
music.groupbyacctidandcount(acct_id)
------------------------------------------
{10001: 2, 20000: 1, 32356: 1, 45000: 2}
警告:请务必阅读我的博客,了解UDA以及扫描完整表时的暗示效果:http://www.doanduyhai.com/blog/?p=2015