我们有一个数据集,其中用户(不同)拥有设施(多个),其中包含拥有(多个)的帐户(多个)。
我遇到了重复案例,例如:
user_ID='A'
facility_ID='1'
account_ID in ('A','B)
facility_ID='2'
account_ID in ('C','D)
count(accounts)
sum(holdings amount)
,holdings_amount
和user_id facility_id facility_name account_id holdings_amount
A 1 Fidelity A 100
A 1 Fidelity A 200
A 1 Fidelity B 300
A 1 Fidelity B 400
A 2 Fidelity C 200
A 2 Fidelity C 100
A 2 Fidelity D 400
A 2 Fidelity D 300
A 3 Fidelity E 100
A 3 Fidelity E 200
A 3 Fidelity F 700
A 4 Fidelity G 200
A 4 Fidelity G 100
A 4 Fidelity H 400
A 4 Fidelity H 300
user
两个设施的值相同。
count(facilities) >1
SQL提供适当的数据: http://sqlfiddle.com/#!15/697f6/1
在facility_name
级别,我想要做的是:
facility_name
(请注意,可以是> 2)count(accounts)
= facility
count(accounts)
来自另一个count(holdings_amount)
= account
count(holdings_amount)
来自另一个sum(holdings_amount)
= account
sum(holdings_amount)
来自另一个holdings amount
= account
holdings amount
的每个facility
值等于另一个accounts
值(按任意顺序) 然后排除重复user_id facility_id facility_name account_id holdings_amount
A 1 Fidelity A 100
A 1 Fidelity A 200
A 1 Fidelity B 300
A 1 Fidelity B 400
A 3 Fidelity E 100
A 3 Fidelity E 200
A 3 Fidelity F 700
A 4 Fidelity G 200
A 4 Fidelity G 100
A 4 Fidelity H 400
A 4 Fidelity H 300
的计数(即与其关联的typedef
)。
所以预期的输出是:
struct set {
void **elements; /* array of elements */
int nElem; /* array count */
size_t elemSize; /* size of element type */
int(*cmpFunc)(void*, void*); /* equality comparison */
};
由于设施2违反所有6个点,设施3不违反第4点,设施4不违反第6点。
如果有任何不清楚或我是否可以提供更多细节,请告诉我。谢谢!
答案 0 :(得分:1)
这里有我的想法,虽然它似乎不会在你的小提琴中返回结果。
select
a2.id,
count(h1.id), count(h2.id), count(distinct a1.id), count(distinct a2.id)
from
(
facilities f1
inner join accounts a1 on a1.facility_id = f1.id
inner join holdings h1 on h1.acc_id = a1.id
)
full outer join
(
facilities f2
inner join accounts a2 on a2.facility_id = f2.id
inner join holdings h2 on h2.acc_id = a2.id)
on f2.id <> f1.id
and a2.id > a1.id
and f2.facility_name = f1.facility_name
and h2.holdings_amount = h1.holdings_amount
group by a2.id
having
count(h1.id) = count(h2.id)
and count(distinct a1.id) = count(distinct a2.id)
and sum(h1.holdings_amount) = sum(h2.holdings_amount)
and count(h1.id) = count(*) and count(h2.id) = count(*);
回过头来,我意识到你确实对多个级别有限制,而这个级别不会由此处理。这可能会帮助您走上正确的轨道,但我可以想到一些问题。
答案 1 :(得分:0)
with f_agg as (
select f.user_id, f.id, f.facility_name,
count(distinct a.id) as a_cnt,
count(distinct h.id) as h_cnt,
sum(h.holdings_amount) as h_tot,
sum(cast(h.id as int)) as h_chk
from
facilities f
inner join accounts a on a.facility_id = f.id
inner join holdings h on h.acc_id = a.id
group by f.user_id, f.id, f.facility_name
), potential as (
select fa1.id as id1, fa2.id as id2
from f_agg as fa1 cross join f_agg as fa2
where fa2.id > fa1.id
and fa2.user_id = fa1.user_id
and fa2.facility_name = fa1.facility_name
and fa2.a_cnt = fa1.a_cnt
and fa2.h_cnt = fa1.h_cnt
and fa2.h_tot = fa1.h_tot
),
matches as (
select coalesce(p1.id1, p2.id1) as id1, coalesce(p1.id2, p2.id2) as id2
from
(
potential p1
inner join f_agg fa1 on fa1.id = p1.id1
inner join accounts a1 on a1.facility_id = fa1.id
inner join
(
select *, row_number() over (partition by acc_id order by id) as rn
from holdings
) h1 on h1.acc_id = a1.id
)
full outer join
(
potential p2
inner join f_agg fa2 on fa2.id = p2.id2
inner join accounts a2 on a2.facility_id = fa2.id
inner join
(
select *, row_number() over (partition by acc_id order by id) as rn
from holdings
) h2 on h2.acc_id = a2.id
)
on p2.id1 = p1.id1 and p2.id2 = p1.id2
and h2.rn = h1.rn and h2.holdings_amount = h1.holdings_amount
group by coalesce(p1.id1, p2.id1), coalesce(p1.id2, p2.id2)
having count(h1.id) = count(*)
and count(h2.id) = count(*)
and sum(cast(h1.id as int)) = min(fa1.h_chk)
and sum(cast(h2.id as int)) = min(fa2.h_chk)
)
select * from matches;
离开这里,以防我回来玩更多:http://sqlfiddle.com/#!15/697f6/120