Question

我有一个hive表A，其中包含4个字符串列，其名称为：user，item，type，time

示例输入如下：

user  item  type   time
1     101    0     06-16   # June 16, 2013 , all dates are in the same year 
2     101    0     09-04
1     102    1     07-03

有4种类型（0,1,2,3），重量（1,2,3,4）。每行的时间分数定义如下：
tScore = (time - 06-01-2013)/7
即从6月1日起多少个星期 fScore = weight of type * time score
然后我需要根据（用户，项目）作为关键字汇总fScore，并根据递减的聚合fScore对表格进行排序。

我不知道我是否描述了我想要做的事情。如果有什么不清楚请评论。

Answer 1

select user, item, (type + 1) * datediff(concat('2013-', time), '2013-06-01') / 7 as fScore
from A
order by fScore desc;

查看所有内置功能的https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF。

另外，如果重量不是（类型+ 1），则可以用case语句替换该部分。例如：

select user, item, case type when 0 then 1 when 1 then 2 when 2 then 3...

在配置单元中定义此map-reduce函数（UDF）

1 个答案: