我无法理解HiveQL排名()。我在WWW上发现了几个排名UDF的实现,例如Edward's nice example。我可以加载和访问这些函数,但是我不能让它们做我想做的事情。这是一个详细的例子:
将UDF加载到CLI进程中:
$ javac -classpath /home/hadoop/hadoop/hadoop-core-1.0.4.jar:/home/hadoop/hive/lib/hive-exec-0.10.0.jar com/m6d/hiveudf/Rank2.java
$ jar -cvf Rank2.jar com/m6d/hiveudf/Rank2.class
hive> ADD JAR /home/hadoop/MyDemo/Rank2.jar;
hive> CREATE TEMPORARY FUNCTION Rank2 AS 'com.m6d.hiveudf.Rank2';
创建一个表格:
create table purchases (
SalesRepId String,
PurchaseOrderId INT,
Amount INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
从此CSV加载数据:
Jana,1,100
Nadia,2,200
Nadia,3,600
Daniel,4,80
Jana,5,120
William,6,170
Daniel,7,140
来自CLI:
LOAD DATA
LOCAL INPATH '/home/hadoop/MyDemo/purchases.csv'
INTO TABLE purchases;
现在我可以看到我的顶级销售代表:
select SalesRepId,sum(amount) as volume
from purchases
group by SalesRepId
ORDER BY volume DESC;
Nadia卖出了800美元的东西,Daniel和Jana都卖出了220美元,而William卖出了170美元
SalesRep Amount
-------- ------
Nadia 800
Daniel 220
Jana 220
William 170
现在我想做的就是为他们编号:Nadia排名第一,Daniel和Jana排在第2位,William排在第4位(不是#3)
select SalesRepId, V.volume,rank2(V.volume)
from
(select SalesRepId,sum(amount) as volume
from purchases
group by SalesRepId
ORDER BY volume DESC) V;
这是我得到的,但不是我想要的:
SalesRep Amount Rank
-------- ------ ----
Nadia 800 1
Daniel 220 1
Jana 220 2
William 170 1
这就是我想要的,但我不能让蜂巢为我做这件事:
SalesRep Amount Rank
-------- ------ ----
Nadia 800 1
Daniel 220 2
Jana 220 2
William 170 4
您能帮助我使用正确的HiveQL对我的销售代表进行排名吗?
感谢JtheRocker的回应。他的改变导致了这个清单:
SalesRep Amount Rank
-------- ------ ----
William 170 1
Daniel 220 2
Jana 220 2
Nadia 800 3
略微修改将Nadia显示为第4名(不是第3名):
private row_number;
@Override
public Object evaluate(DeferredObject[] currentKey) throws HiveException {
row_number++;
if (!sameAsPreviousKey(currentKey)) {
this.counter = row_number;
copyToPreviousKey(currentKey);
}
return new Long(this.counter);
}
答案 0 :(得分:7)
使用Hive 0.11中引入的Windowing and Analytics functions,您可以使用:
select SalesRepId, volume as amount , rank() over (order by V.volume desc) as rank from
(select SalesRepId,sum(amount) as volume from purchases group by SalesRepId) V;
答案 1 :(得分:1)
如果您有如下评估功能,假设您直接使用该功能形成您提到的指南,
private long counter;
@Override
public Object evaluate(DeferredObject[] currentKey) throws HiveException {
if (!sameAsPreviousKey(currentKey)) {
this.counter = 0;
copyToPreviousKey(currentKey);
}
return new Long(++this.counter);
}
尝试将其更改为以下内容,以便在找到新卷时不会重置计数器,而是在找到相同的卷时不增加,但只有在找到新卷时才增加。
private long counter;
@Override
public Object evaluate(DeferredObject[] currentKey) throws HiveException {
//when not same as previous key you rather increment
if (!sameAsPreviousKey(currentKey)) {
this.counter ++;
copyToPreviousKey(currentKey);
}
//else you keep the counter as it is
return new Long(++this.counter);
}
告诉我这是否有帮助。