我的桌子上摆满了地址。我想选择与另一个帐户位于同一地址的每个帐户。
如果我的数据如此:
------------------------------------
| Account Number | Address |
| 12345 | 55 Bee St |
| 23456 | 94 Water way |
| 34567 | 15 Beagle Drive |
| 45678 | 55 Bee St |
| 56789 | 94 Water way |
| 67890 | 12 Green St |
-------------------------------------
我想按照以下方式做点什么:
SELECT * FROM accounts WHERE group by address > 1;
这样我的结果就是:
------------------------------------
| Account Number | Address |
| 12345 | 55 Bee St |
| 23456 | 94 Water way |
| 45678 | 55 Bee St |
| 56789 | 94 Water way |
-------------------------------------
如果它有任何区别,那就是PostgreSQL数据库。
答案 0 :(得分:1)
您需要使用两个地址相同的连接条件将表连接到自身,但要确保两行之间的帐号不同:
select distinct account_number, address
from accounts a1
join accounts a2 on a1.account_number > a2.account_number
and a1.address = a2.address
请注意在帐号之间使用>
比较,这不仅会阻止行加入自身,还会阻止反向联接。
我添加了distinct
以防有三个帐户具有相同的地址,否则您不需要它。
答案 1 :(得分:1)
对同一个表进行左连接以查找具有相同地址的记录,并在字段上进行分组,然后您可以计算匹配的地址以获取至少具有一个匹配地址的记录:
select a.AccountNumber, a.Address
from accounts a
left join accounts o on o.Address = a.Address and o.AccountNumber <> a.AccountNumber
group by a.AccountNumber, a.Address
having count(o.AccountNumber) >= 1
此方法为您提供每个帐号的地址,如果地址出现次数超过两次,则不会给您重复。
答案 2 :(得分:1)
这应该可以解决问题:
SELECT *
FROM Account A1
WHERE
EXISTS (
SELECT *
FROM Account A2
WHERE
A1.AccountNumber <> A2.AccountNumber
AND A1.Address = A2.Address
)
简单英语:选择每个帐户,使其他帐户(A1.AccountNumber <> A2.AccountNumber
)具有相同的地址(A1.Address = A2.Address
)。
答案 3 :(得分:1)
以下是测试三个有效答案的表现
EXISTS
优于LEFT JOIN
/ GROUP BY
:
表格包含100k行,b
的1000个不同值
性能差距随着行数的增加而扩大 - 重复次数越少意味着差异越小
没有索引。
CREATE TABLE tbl (a text, b text);
INSERT INTO tbl
SELECT (random()*10000)::int::text
,(random()*1000)::int || ' some more text here'
FROM generate_series(1, 100000) g;
LEFT JOIN
/ GROUP BY
/ HAVING
EXPLAIN ANALYZE
SELECT t.a, t.b
FROM tbl t
LEFT join tbl t2 on t2.b = t.b and t2.a <> t.a
GROUP by t.a, t.b
HAVING count(t2.a) >= 1;
JOIN
/ GROUP BY
EXPLAIN ANALYZE
SELECT t.a, t.b
FROM tbl t
JOIN tbl t2 ON t2.b = t.b AND t2.a <> t.a
GROUP BY t.a, t.b;
EXISTS
EXPLAIN ANALYZE
SELECT *
FROM tbl t
WHERE EXISTS (
SELECT *
FROM tbl t2
WHERE t2.a <> t.a
AND t2.b = t.b
);
DISTINCT
EXPLAIN ANALYZE
SELECT DISTINCT t.a, t.b
FROM tbl t
JOIN tbl t2 on t2.b = t.b and t2.a <> t.a;
-> SQLfiddle displaying EXPLAIN ANALYZE output for the queries
添加多列索引(SQLfiddle)..
CREATE INDEX a_b_idx ON tbl(b, a);
.. 运行时不会改变。 Postgres不使用索引。它显然希望顺序表扫描更快,因为无论如何都必须读取整个表。
除执行时间外,还要注意行数,证明我的观点如下所述:
JOIN创建了许多中间重复项,EXISTS
版本避免开头:
输出<{1}} 1。:
HashAggregate (cost=230601.26..230726.26 rows=10000 width=31) (actual time=12127.090..12183.087 rows=99476 loops=1) Filter: (count(t2.a) >= 1) -> Hash Left Join (cost=3670.00..154661.89 rows=10125250 width=31) (actual time=99.591..5897.744 rows=9991102 loops=1) Hash Cond: (t.b = t2.b) Join Filter: (t2.a t.a) Rows Removed by Join Filter: 101052 -> Seq Scan on tbl t (cost=0.00..1736.00 rows=100000 width=27) (actual time=0.036..36.197 rows=100000 loops=1) -> Hash (cost=1736.00..1736.00 rows=100000 width=27) (actual time=99.141..99.141 rows=100000 loops=1) Buckets: 2048 Batches: 8 Memory Usage: 784kB -> Seq Scan on tbl t2 (cost=0.00..1736.00 rows=100000 width=27) (actual time=0.004..44.899 rows=100000 loops=1) Total runtime: 12208.954 ms
输出EXPLAIN ANALYZE
3。:
Hash Semi Join (cost=3670.00..7783.00 rows=1 width=27) (actual time=81.630..247.371 rows=100000 loops=1) Hash Cond: (t.b = t2.b) Join Filter: (t2.a t.a) Rows Removed by Join Filter: 1009 -> Seq Scan on tbl t (cost=0.00..1736.00 rows=100000 width=27) (actual time=0.010..32.758 rows=100000 loops=1) -> Hash (cost=1736.00..1736.00 rows=100000 width=27) (actual time=81.388..81.388 rows=100000 loops=1) Buckets: 2048 Batches: 8 Memory Usage: 784kB -> Seq Scan on tbl t2 (cost=0.00..1736.00 rows=100000 width=27) (actual time=0.003..32.114 rows=100000 loops=1) Total runtime: 272.508 ms
答案 4 :(得分:0)
您需要HAVING
子句:
SELECT * FROM accounts
GROUP BY address
HAVING COUNT(address) > 1;
答案 5 :(得分:0)
我相信你正在寻找HAVING条款:
select address,sum(accountnumber) group by address having sum(accountnumber) >1