Question

我遇到了一篇关于加入分解的文章。

情景＃1（不太好）：

Select * from tag
Join tag_post ON tag_post.tag_id=tag.id
Join post ON tag_post.post_id=post.id
Where tag.tag='mysql'

情景＃2（好）：

Select * from tag where tag='mysql'

Select * from tag_post Where tag_id=1234

Select * from post where post.id in (123,456,9098,545)

有人建议坚持使用场景＃2，原因很多，特别是缓存。问题是如何加入我们的应用程序。你能用PHP给我们一个例子吗？单独检索后？（我看过MyISAM Performance: Join Decomposition? 但它没有帮助）

Answer 1

您可以使用SQL子选择（如果我理解您的问题）。使用PHP会比较奇怪，而SQL具备所有功能。

SELECT *
FROM `post`
WHERE `id` IN (
    SELECT `post_id`
    FROM `tag_post`
    WHERE `tag_id` = (
        SELECT `tag_id`
        FROM `tag`
        WHERE `tag` = 'mysql'
    )
)

我不确定你的数据库结构是什么样的，但是这应该让你开始。这几乎是SQL的开始。查询中的查询。您可以使用子选择的结果选择数据。

请在复制此SQL并告诉我它不起作用之前，验证所有表名和列名。

在任何人开始对速度，缓存和效率开始哭泣之前：我认为这是相当有效的。不是选择所有数据并使用PHP循环它，您可以使用本机SQL选择较小的位，因为它可以使用。

同样，我强烈反对使用PHP获取特定数据。 SQL就是你所需要的。

编辑：这是您的脚本

假设你有一些包含所有数据的多维数组：

// dummy results

// table tag
$tags = array(
    // first record
    array(
        'id'    => 0,
        'tag'   => 'mysql'
    ), 
    // second record
    array(
        'id'    => 1,
        'tag'   => 'php'
    )
    // etc
);

// table tag_post
$tag_posts = array(
    // first record
    array(
        'id'        => 0,
        'post_id'   => 0,   // post #1
        'tag_id'    => 0    // has tag mysql
    ),
    // second record
    array(
        'id'        => 1,
        'post_id'   => 1,   // post #2
        'tag_id'    => 0    // has tag mysql
    ),
    // second record
    array(
        'id'        => 2,
        'post_id'   => 2,   // post #3
        'tag_id'    => 1    // has tag mysql
    )
    // etc
);

// table post
$posts = array(
    // first record
    array(
        'id'        => 0,
        'content'   => 'content post #1'
    ),
    // second record
    array(
        'id'        => 1,
        'content'   => 'content post #2'
    ),
    // third record
    array(
        'id'        => 2,
        'content'   => 'content post #3'
    )
    // etc
);

// searching for tag
$tag = 'mysql';
$tagid = -1;
$postids = array();
$results = array();

// first get the id of this tag
foreach($tags as $key => $value) {
    if($value['tag'] === $tag) {
        // set the id of the tag
        $tagid = $value['id'];

        // theres only one possible id, so we break the loop
        break;
    }
}

// get post ids using the tag id
if($tagid > -1) { // verify if a tag id was found
    foreach($tag_posts as $key => $value) {
        if($value['tag_id'] === $tagid) {
            // add post id to post ids
            $postids[] = $value['post_id'];
        }
    }
}

// finally get post content
if(count($postids) > 0) { //verify if some posts were found
    foreach($posts as $key => $value) {
        // check if the id of the post can be found in the posts ids we have found
        if(in_array($value['id'], $postids)) {
            // add all data of the post to result
            $results[] = $value;
        }
    }
}

如果你查看上面脚本的长度，这正是我坚持使用SQL的原因。

现在，我记得，您希望join使用PHP，而不是在SQL中执行。这不是连接，而是使用某些数组获得结果。我知道，但加入只会浪费时间而且效率低于仅保留所有结果。

编辑：21-12-12作为以下评论的结果

我做了一点基准测试，结果令人惊叹：

DATABASE RECORDS:
tags:           10
posts:          1000
tag_posts:      1000 (every post has 1 random tag)

Selecting all posts with a specific tag resulted in 82 records.

SUBSELECT RESULTS:
run time:                        0.772885084152
bytes downloaded from database:  3417

PHP RESULTS:
run time:                        0.086599111557
bytes downloaded from database:  48644



Please note that the benchmark had both the application as the database on the
same host. If you use different hosts for the application and the database layer,
the PHP result could end up taking longer because naturally sending data between
two hosts will take much more time then when they're on the same host.

即使子选择返回的数据少得多，但请求的持续时间也要长近10倍......

我从未预料到这些结果，所以我确信并且当我知道性能很重要时我肯定会使用这些信息但是我仍然会使用SQL进行较小的操作......

在PHP中加入分解

1 个答案: