考虑一个包含以下列的数据库表:
数据库代表Math Genealogy Project的数据,每个数学家通常只有一个顾问,但有两个顾问的情况。
视觉辅助使事情更清晰:
如何计算每位数学家的后代数?
我应该使用公用表格表达式( WITH RECURSIVE ),但我现在几乎陷入困境。我发现的所有类似例子都涉及只有一个父级而不是两个父级的层次结构。
更新
我将Vladimir Baranov提供的SQL Server解决方案改编为也适用于PostgreSQL:
WITH RECURSIVE cte AS (
SELECT m.id as start_id,
m.id,
m.name,
m.advisor1,
m.advisor2,
1 AS level
FROM public.mathematicians AS m
UNION ALL
SELECT cte.start_id,
m.id,
m.name,
m.advisor1,
m.advisor2,
cte.level + 1 AS level
FROM public.mathematicians AS m
INNER JOIN cte ON cte.id = m.advisor1
OR cte.id = m.advisor2
),
cte_distinct AS (
SELECT DISTINCT start_id, id
FROM cte
)
SELECT cte_distinct.start_id,
m.name,
COUNT(*)-1 AS descendants_count
FROM cte_distinct
INNER JOIN public.mathematicians AS m ON m.id = cte_distinct.start_id
GROUP BY cte_distinct.start_id, m.name
ORDER BY cte_distinct.start_id
答案 0 :(得分:4)
您没有说出您使用的DBMS。我将在此示例中使用SQL Server,但它也适用于支持递归查询的其他数据库。
示例数据
我从Euler开始只输入你树的右边部分。
最有趣的部分是Lagrange和Dirichlet之间的多条路径。
DECLARE @T TABLE (ID int, name nvarchar(50), Advisor1ID int, Advisor2ID int);
INSERT INTO @T (ID, name, Advisor1ID, Advisor2ID) VALUES
(1, 'Euler', NULL, NULL),
(2, 'Lagrange', 1, NULL),
(3, 'Laplace', NULL, NULL),
(4, 'Fourier', 2, NULL),
(5, 'Poisson', 2, 3),
(6, 'Dirichlet', 4, 5),
(7, 'Lipschitz', 6, NULL),
(8, 'Klein', NULL, 7),
(9, 'Lindemann', 8, NULL),
(10, 'Furtwangler', 8, NULL),
(11, 'Hilbert', 9, NULL),
(12, 'Taussky-Todd', 10, NULL);
这就是它的样子:
SELECT * FROM @T;
+----+--------------+------------+------------+
| ID | name | Advisor1ID | Advisor2ID |
+----+--------------+------------+------------+
| 1 | Euler | NULL | NULL |
| 2 | Lagrange | 1 | NULL |
| 3 | Laplace | NULL | NULL |
| 4 | Fourier | 2 | NULL |
| 5 | Poisson | 2 | 3 |
| 6 | Dirichlet | 4 | 5 |
| 7 | Lipschitz | 6 | NULL |
| 8 | Klein | NULL | 7 |
| 9 | Lindemann | 8 | NULL |
| 10 | Furtwangler | 8 | NULL |
| 11 | Hilbert | 9 | NULL |
| 12 | Taussky-Todd | 10 | NULL |
+----+--------------+------------+------------+
<强>查询强>
这是一个经典的递归查询,有两个有趣的点。
1)CTE
的递归部分使用Advisor1ID
和Advisor2ID
连接到锚点部分:
INNER JOIN CTE
ON CTE.ID = T.Advisor1ID
OR CTE.ID = T.Advisor2ID
2)由于可能有多个到后代的路径,递归查询可能会多次输出该节点。为了消除这些重复项,我在DISTINCT
中使用了CTE_Distinct
。有可能更有效地解决它。
更好地了解查询的工作原理,分别运行每个CTE并检查中间结果。
WITH
CTE
AS
(
SELECT
T.ID AS StartID
,T.ID
,T.name
,T.Advisor1ID
,T.Advisor2ID
,1 AS Lvl
FROM @T AS T
UNION ALL
SELECT
CTE.StartID
,T.ID
,T.name
,T.Advisor1ID
,T.Advisor2ID
,CTE.Lvl + 1 AS Lvl
FROM
@T AS T
INNER JOIN CTE
ON CTE.ID = T.Advisor1ID
OR CTE.ID = T.Advisor2ID
)
,CTE_Distinct
AS
(
SELECT DISTINCT
StartID
,ID
FROM CTE
)
SELECT
CTE_Distinct.StartID
,T.name
,COUNT(*) AS DescendantCount
FROM
CTE_Distinct
INNER JOIN @T AS T ON T.ID = CTE_Distinct.StartID
GROUP BY
CTE_Distinct.StartID
,T.name
ORDER BY CTE_Distinct.StartID;
<强>结果强>
+---------+--------------+-----------------+
| StartID | name | DescendantCount |
+---------+--------------+-----------------+
| 1 | Euler | 11 |
| 2 | Lagrange | 10 |
| 3 | Laplace | 9 |
| 4 | Fourier | 8 |
| 5 | Poisson | 8 |
| 6 | Dirichlet | 7 |
| 7 | Lipschitz | 6 |
| 8 | Klein | 5 |
| 9 | Lindemann | 2 |
| 10 | Furtwangler | 2 |
| 11 | Hilbert | 1 |
| 12 | Taussky-Todd | 1 |
+---------+--------------+-----------------+
此处DescendantCount
将节点本身计为后代。如果要为叶节点看0而不是1,则可以从此结果中减去1。
这是SQL Fiddle。