如何在Pig和Hive中找到第n个最大和最小的数字?

时间:2017-07-01 13:30:02

标签: hadoop hive apache-pig bigdata

我有一个包含列id,名称和工资的表 我想从表中找到第N个最大和最小的薪水

Id Name Salary
--------------
1  aa   11111
--------------
2  bb   77777
-------------
3 cc 33333
-------------
4 dd 44444
-------------
5 ee 99999

2 个答案:

答案 0 :(得分:0)

您可以使用LIMIT或Rank.Using LIMIT,加载数据,按降序排序以获得最大的工资。将数据限制为第N条记录。这将为您提供N条记录,其中第N条记录将位于数据集的底部。按升序再次对它进行排序,这将给你第N条记录作为最高记录,然后再次使用LIMIT获得最高记录。对于最小值使用类似的步骤。按升序排序,限制为第N条记录并按降序排序订单和限制1.

A = LOAD 'data.txt' USING PigStorage(',') AS (id:int,name:chararray,salary:int);
B = ORDER A BY Salary DESC;
C = LIMIT B 4; --Note: N = 4
D = ORDER C BY salary ASC;
E = LIMIT D 1;

答案 1 :(得分:0)

在Hive中:

第n个MAX薪水

<强>查询:

$select = "";
$color = "";

if (!$current_status)
{
    $select = "<option disabled selected value> -- select an option -- </option>";   
}

foreach ($list_of_statuses as $key => $row)
{
    $select = $select . "<option value='" . $row['status'] . "'";
    $select = $select . " style=\"color: " . $row['fg_color'] . "; background-color: " . $row['bg_color'] . "\"";

    if ($row['status'] == $current_status)
    {
        $select = $select . " selected=\"selected\"";
        $color = $row['bg_color'];
    }

    $select = $select . ">" . $row['status'] . "</option>";
}

$select = "<select name=" . $day ."[] onChange=\"setBg(this)\" style=\"background-color: " . $color . "\">" . $select + "</select>";

echo $select;

<强>输出:

select id,name,salary from (
select id,name,salary,rank() over(ORDER BY salary DESC)ran from salarytable ) s 
where ran=1;

对于第n个MIN工资

<强>查询:

5 ee 99999

<强>输出:

select id,name,salary from (
select id,name,salary,rank() over(ORDER BY salary ASC)ran from salarytable ) s 
where ran=4;

P.S:4 dd 4444 的号码将在此处定义第n个值。