Question

我有两张桌子： 1）其中一个是发票，有数千个数据。在我的INVOICES表中，有客户的发票及其价格。 2）另一个是债务。在我的债务表中，每个客户都有发票的总债务我的目标是找到最近的金额和债务的发票。例如，我有表：

债务表：

    CUSTOMER_ID         TOTAL_DEBTS
      3326660                444$      
      2789514                165$     
      4931541                121$

发票表：

CUSTOMER_ID       INVOICE_ID        AMOUNT_OF_INVOICE
  3326660              1a                   157$ 
  3326660              1b                   112$ 
  3326660              1c                   10$ 
  3326660              1d                   94$ 
  3326660              1e                   47$ 
  3326660              1f                   35$ 
  3326660              1g                   14$ 
  3326660              1h                   132$ 
  3326660              1i                   8$ 
  3326660              1j                   60$ 
  3326660              1k                   42$ 
  2789514              2a                   86$ 
  2789514              2b                   81$
  2789514              2c                   99$
  2789514              2d                   61$
  2789514              2e                   16$
  2789514              2f                   83$
  4931541              3a                   11$
  4931541              3b                   14$
  4931541              3c                   17$
  4931541              3d                   121$
  4931541              3e                   35$
  4931541              3f                   29$

我的目标表是：

CUSTOMER_ID        TOTAL_DEBTS     CALCULATED_AMOUNT        INVOICES_ID   
  3326660              444$              444$              1a,1b,1f,1h,1i    
  2789514              165$              164$                   2b,2f
  4931541              121$              121$                    3d

因为我的表中有数千个数据，性能对我来说非常重要。我从stackoverflow中找到代码： closest subset sum

但是，性能很低。当我在calculeted_amount和total_debts之间找到相同的值时，我必须停止加法循环。

感谢您的帮助。

Answer 1

使用递归查询：

<强> ^demo

delta

输出：

with 
    t1 as ( 
        select customer_id cid, total_debts dbt, invoice_id iid, amount_of_invoice amt, 
               row_number() over (partition by customer_id order by invoice_id) rn
          from debts d join invoices i using (customer_id) ),
    t2 (cid, iid, ams, dbt, amt, sma, rn) as ( 
        select cid, cast(iid as varchar2(4000)), cast(amt as varchar2(4000)), 
               dbt, amt, amt, rn
          from t1 
        union all 
        select t2.cid, 
               t2.iid || ', ' || t1.iid,
               t2.ams || ', ' || t1.amt,
               t2.dbt, t2.amt, t1.amt + t2.sma, t1.rn
          from t2 
          join t1 on t1.cid = t2.cid and t1.rn > t2.rn and t2.sma + t1.amt <= t1.dbt),
    t3 as (
        select t2.*, rank() over (partition by cid order by dbt - sma ) rnk
          from t2)
select cid, iid, ams, dbt, sma from t3 where rnk = 1

子查询CID IID AMS DBT SMA ------- ---------------------------- ------------------------------ -------- -------- 2789514 2b, 2f 81, 83 165 164 3326660 1a, 1d, 1e, 1g, 1h 157, 94, 47, 14, 132 444 444 3326660 1b, 1c, 1d, 1e, 1f, 1g, 1h 112, 10, 94, 47, 35, 14, 132 444 444 3326660 1a, 1c, 1f, 1h, 1i, 1j, 1k 157, 10, 35, 132, 8, 60, 42 444 444 3326660 1a, 1b, 1f, 1h, 1i 157, 112, 35, 132, 8 444 444 4931541 3d 121 121 121 6 rows selected连接两个表并添加用于组合数据的列T1。 rn是等级的，它确实是工作的主要部分 - 将所有数据结合起来，直到总和到达债务。 T2过滤了功能T3的最佳解决方案。正如您在rank 3326660所看到的，有四种最佳组合。

对于大量数据，递归子查询很慢，并且此解决方案不起作用，请注意。

如何在oracle中找到最接近的子集和

1 个答案: