MySQL分层递归查询

时间:2017-06-27 11:22:47

标签: mysql hierarchical-data

我有一个MySQL表,如下所示:

NCBI_TAXON_ID   PARENT_ID   TAXON_NAME  TAXON_STRAIN    RANK    
1   1   root        no rank
2759    131567  Eukaryota       superkingdom
6072    33208   Eumetazoa       no rank
7711    33511   Chordata        phylum
7742    89593   Vertebrata      no rank
7776    7742    Gnathostomata       no rank
8287    117571  Sarcopterygii       no rank
9347    32525   Eutheria        no rank
9443    314146  Primates        order
9526    314293  Catarrhini      parvorder
9604    314295  Hominidae       family
9605    207598  Homo        genus
9606    9605    Homo sapiens        species
32523   1338369 Tetrapoda       no rank
32524   32523   Amniota     no rank
32525   40674   Theria      no rank
33154   2759    Opisthokonta        no rank
33208   33154   Metazoa     kingdom
33213   6072    Bilateria       no rank
33511   33213   Deuterostomia       no rank
40674   32524   Mammalia        class
89593   7711    Craniata        subphylum
117570  7776    Teleostomi      no rank
117571  117570  Euteleostomi        no rank
131567  1   cellular organisms      no rank
207598  9604    Homininae       subfamily
314146  1437010 Euarchontoglires        superorder
314293  376913  Simiiformes     infraorder
314295  9526    Hominoidea      superfamily
376913  9443    Haplorrhini     suborder

以上数据位于分层模型中。例如,如果我想找到' Homo Sapiens '的层次结构。它通过 PARENT_ID 链接,即9605等。

9606    9605    Homo sapiens        species
9605    207598  Homo        genus
207598  9604    Homininae       subfamily
.
.
.
1   1   root        no rank

希望我能够正确解释数据模型。

现在我想通过提供taxon_name,即“智人”来检索所有层次结构。并按层次顺序排列。

可以在MySQL中做到吗? 需要帮助。

1 个答案:

答案 0 :(得分:0)

Recursive CTEs (WITH RECURSIVE)是MySQL 8.0(和MariaDB 10.2.2)引入的。使用这样的表达式,您可以轻松实现所需的目标:

WITH RECURSIVE ancestors AS (
    SELECT * FROM taxonomy
    WHERE taxon_name = @desired_taxon

    UNION DISTINCT

    SELECT t.* FROM
        taxonomy    AS t,
        ancestors   AS a
    WHERE t.ncbi_taxon_id = a.parent_id
) SELECT * FROM ancestors;

一旦到达根记录,就必须使用UNION DISTINCT而不是UNION [ALL]来避免无限循环。如果您可以在结果集中没有根记录,或者只是不想使用根记录,那么可以改用以下内容:

WITH RECURSIVE ancestors AS (
    SELECT * FROM taxonomy
    WHERE taxon_name = @desired_taxon

    UNION

    SELECT t.* FROM
        taxonomy    AS t,
        ancestors   AS a
    WHERE
            t.ncbi_taxon_id = a.parent_id
        AND t.ncbi_taxon_id != 1
) SELECT * FROM ancestors;

此外,看来且仅当rank = 'no rank'(或taxon_strain = ''taxon_strain)时,数据集中的记录才具有NULL。如果是这种情况,则可以通过删除rank列并从是否定义taxon_strain推断出该值来简化数据库架构。另外,parent_id应该引用ncbi_taxon_id,并且可以说根记录的parent_id应该是NULL —这也将允许您的递归查询不需要{{1} },如果您希望根记录显示在结果集中。

特别是,这是我要使用的架构:

UNION DISTINCT

然后您的递归查询可以是以下内容:

CREATE TABLE taxonomy (
    id          INTEGER UNSIGNED    PRIMARY KEY,
    parent_id   INTEGER UNSIGNED,
    name        VARCHAR(32),
    strain      VARCHAR(32),

    FOREIGN KEY (parent_id) REFERENCES taxonomy (id)
);

INSERT INTO taxonomy VALUES
    (1               , NULL        , 'root'                , NULL             ),
    (131567          , 1           , 'cellular organisms'  , NULL             ),
    (2759            , 131567      , 'Eukaryota'           , 'superkingdom'   ),
    (33154           , 2759        , 'Opisthokonta'        , NULL             ),
    (33208           , 33154       , 'Metazoa'             , 'kingdom'        ),
    (6072            , 33208       , 'Eumetazoa'           , NULL             ),
    (33213           , 6072        , 'Bilateria'           , NULL             ),
    (33511           , 33213       , 'Deuterostomia'       , NULL             ),
    (7711            , 33511       , 'Chordata'            , 'phylum'         ),
    (89593           , 7711        , 'Craniata'            , 'subphylum'      ),
    (7742            , 89593       , 'Vertebrata'          , NULL             ),
    (7776            , 7742        , 'Gnathostomata'       , NULL             ),
    (117570          , 7776        , 'Teleostomi'          , NULL             ),
    (117571          , 117570      , 'Euteleostomi'        , NULL             ),
    (8287            , 117571      , 'Sarcopterygii'       , NULL             ),
    .
    .
    .
    ( ... );

这样的查询可以保证在WITH RECURSIVE ancestors AS ( SELECT * FROM taxonomy WHERE name = @desired_taxon UNION SELECT t.* FROM taxonomy AS t, ancestors AS a WHERE t.id = a.parent_id ) SELECT * FROM ancestors; 为NULL的行处终止,因为对于任何其他行,它都永远不等于parent_id,因为作为主键,id不得为id