我想从表TG
中找到每个客户的1个最近邻居。必须使用在val
上计算出的距离来做出决定。
一种可能的解决方案是交叉连接自己的表,但是'TG'的大小为100k,初始表为5000万-达到较大的输出。所以我想到了使用窗口函数的想法:
我无法使这种算法起作用。那我该怎么办呢?
SELECT
cust_id2,
MIN(CASE WHEN cust_id <> cust_id2 then cust_id end) -- to get for cust_id2 from TG another cust_id from all_custs table
OVER (PARTITION BY cust_id2
ORDER BY SQRT(POWER(cur.val1 - pref.val1, 2) + POWER(cur.val2 - pref.val2, 2)) -- here I want to order by distance but I need current value and previous one. Nested windows function isn't allowed(
FROM
(
select all_custs.cust_id, val1, val2, aa.cust_id2 from all_custs
left join (sel cust_id as cust_id2 from TG) TG, aa on aa.cust_id2 = all_custs.cust_id
) AS dt
where cust_id2 is not null
`TG` - stores just ids - as numbers. Every cust from TG are also in `all_custs`
Table all_custs
cust_id (number) | val1(decimal) | val2(decimal)
_________________|_______________|_____________
123123131 | 123.1 | 2
234234241 | 75.15 | 5
525165354 | 676.12 | 3
对于cust_id = 123123131,最接近的将是cust234234241。可能有多个val列
UPD1:供参考。这是可以通过交叉联接完成的方法,但不应这样做:
sel tg.cust_id as tg_cust_id, cg.cust_id as cg_cust_id
SQRT (
POWER((tg.val1 - cg.val1)/max_val1, 2) -- min = 0
+ POWER((tg.val2 - cg.val2)/max_val2, 2) -- min = 0
) AS DIST
from (
sel arpau.cust_id, val1, val2
from all_custs join TG aa on aa.cust_id2 = arpau.cust_id
where arpau.branch_id = 95
) tg
join (
sel arpau.cust_id, val1, Max(val1) over(ROWS UNBOUNDED PRECEDING) max_val1
, val2, Max(val1) over(ROWS UNBOUNDED PRECEDING) max_val2
from all_custs left join TG aa on aa.cust_id2 = arpau.cust_id
where aa.cust_id2 is null
) cg on tg.cust_id <> cg.cust_id
QUALIFY ROW_NUMBER() OVER(PARTITION BY tg.cust_id ORDER BY DIST) = 1