Question

我正在搜索Impala查询想法。

让我尝试解释我的问题：这全都与对ID排序有关。我有一张带有不同类型ID的表。头ID和一种子ID（一个头ID最多有150个子ID）

通过窗口函数（ROW_NUMBER() OVER (PARTITION BY)），对它们进行排序没有问题。主要问题是，它们具有特定的顺序，该顺序存储在第二个表中。

第二个表包含每个Sub_ID，哪个ID在前面，哪个在后面。

我设法对这些分区进行排序并标识了第一个ID，但是我不知道如何对另一个表进行排序。

让我们尝试给您看一个例子：

表1

head_ID sub_ID
1        001
1        002
1        003
2        011
2        012
2        013
2        014

表2

sub_ID begin_ID end_ID
002     003      001
012     011      0013

希望您能理解

Answer 1

我认为您将必须分两个步骤进行操作-首先整理排序顺序，然后再进行实际查询。我怀疑获得排序顺序可能会很昂贵（我想不出一种不会循环或递归的方式），因此，如果可能的话，应避免不必要地频繁进行排序。如果您的table2不经常更改，并且您可以更改设计，那么我很想将实际的排序顺序存储在该表中的额外列中。

任何时间表2被修改，您都必须先更新排序顺序，然后才能再次运行此查询。因为要更新排序顺序可能需要多次对table2进行编辑，并且该顺序实际上会被破坏到最后一次，所以我可能会在表上放置一个触发器以设置一个标志，指示该顺序需要更新。您可以在运行此查询之前或在每晚维护运行中（以先到者为准）检查标志，并根据需要更新订单。

如果您不能更改数据库，则可以在运行每个查询之前进行排序，但是根据您的数据，它可能使速度降低很多。

无论如何，要弄清排序顺序，您需要为table2中的每一行创建一个orderno（直接更新行，或在单独的临时表中。看起来数据中有许多顺序列表），一个用于每个headerID。您可以通过首先找到每个链的开始行来创建订单（我假设begin_ID为null），并为其指定orderNo 1，然后在循环中查找应作为下一个orderno的行。，然后将其分配给他们，直到找不到为止。如果结尾处有未分配orderNo的行，则说明数据有问题。

--Set up the work table
DECLARE @Work TABLE (Sub_ID int, orderNo int);

-- Set up the start of each order list
insert into @Work (sub_ID, orderNo)
Select sub_ID, 1
from table2 
where begin_ID is null;

DECLARE @Finished int = 0;  --Flag to see if we're done
DECLARE @NextOrder int = 2; --Next order number to process

While @Finished = 0
BEGIN
    -- add the next level for all the order lists
    insert into @Work (sub_ID, orderNo)
    Select t2.sub_ID, @NextOrder
    From table2 t2
        inner join @Work w on w.sub_ID = t2.begin_ID       -- We want rows that are next in an order chain
        left outer join @Work w2 on w2.sub_ID = t2.sub_ID  -- and haven't already been done (to avoid loops)
    Where w2.Sub_ID is null;

    IF @@ROWCOUNT = 0 SET @Finished = 1;  --flag if nothing was updated (so stop)
    SET @NextOrder = @NextOrder + 1;  -- next order level to add
END;

--example usage of order table.  Note that any records where w.sub_id is null means
-- that record was not in a reachable order list (either the table2 record does not
-- exist, or the order list walk never reached it).
Select t1.*
from table1 t1
    left outer join @Work w on w.sub_ID = t1.sub_id
order by T1.head_id, w.orderNo

您也可以通过使用递归CTE来做到这一点，但是理论是相同的。如果IMPALA需要采用这种方式，那么这将使您可以在一个查询中完成所有工作。

通过表IMPALA之间的ORDER BY

1 个答案: