使用左侧和右侧的嵌套集模型导入500 000个类别。

时间:2014-11-12 03:00:14

标签: php mysql performance

我有500 000个类别,其中包含以下mysql表结构

CREATE TABLE nested_category(         category_id INT AUTO_INCREMENT PRIMARY KEY,         name VARCHAR(20)NOT NULL,         lft INT NOT NULL,         rgt INT NOT NULL );

以下是左右http://mikehillyer.com/media//numbered_tree.png

的跟随方式

我有另一张桌子,我得到的行数据低于结构。

CREATE TABLE IF NOT NOT EXISTS tbl_categoriesid int(111)NOT NULL,   category_id int(111)NOT NULL,   category_path varchar(255)NOT NULL,   category_name varchar(255)NOT NULL,   status int(1)NOT NULL,   tdate datetime NOT NULL )ENGINE = MyISAM AUTO_INCREMENT = 498013 DEFAULT CHARSET = latin1;

哪个有这样的数据

id  category_id category_path               category_name   status  tdate
1   381773      Top/Arts                Arts        0   2014-10-07 11:11:05
2   423945      Top/Arts/Animation          Animation   0   2014-10-07 11:11:21
3   425085      Top/Arts/Animation/Anime        Anime       0   2014-10-07 11:11:34
4   425147      Top/Arts/Animation/Anime/Characters     Characters  0   2014-10-07 11:11:35

我在下面有php脚本来插入数据

echo $sql = "Select * from tbl_categories where status='1' and`category_path` LIKE 'Top/Arts%' order by id limit 0,5";
$resCategory = select($sql);
for($k=0;$k<count($resCategory);$k++){

    $rec_id =$resCategory[$k]["id"];
    $category_id =$resCategory[$k]["category_id"];
    $category_path =$resCategory[$k]["category_path"];
    $category_name =$resCategory[$k]["category_name"];
    $friendly_url = getAlias($category_name);
    //exit;
    $cate_path_arr = explode("/",$category_path);
    $parent_full_friendly_url = "";
    $pad="";
    for($i=1;$i<count($cate_path_arr)-1;$i++){
        $parent_full_friendly_url .= $pad.getAlias($cate_path_arr[$i]);
        $pad="/";

    }

    $full_friendly_url = $parent_full_friendly_url.$pad.$friendly_url;

    if($cate_path_arr[count($cate_path_arr)-2]  == "Top"){
        $parent_category_id = 0;
    }else{
        $parentCatName  = $cate_path_arr[count($cate_path_arr)-2] ;     
        $sql = "Select id,root_id from nested_category where title = \"".$parentCatName."\" and full_friendly_url=\"".$parent_full_friendly_url."\"";
        $parentCat = select($sql);
        $parent_category_id = $parentCat[0]["id"];
        $root_id = $parentCat[0]["root_id"];

    }

    $sql = "INSERT INTO nested_category SET ";
    $sql .= "root_id = \"".$root_id."\", ";
    $sql .= "title = \"".$category_name."\", ";
    $sql .= "category_id = \"".$parent_category_id."\", ";
    $sql .= "page_title = \"".$category_name."\", ";
    $sql .= "friendly_url = \"".$friendly_url."\", ";
    $sql .= "full_friendly_url = \"".$full_friendly_url."\", ";
    $sql .= "featured = \"y\", ";
    $sql .= "enabled = \"y\" ";
    $db->query($sql);
    $lastInsId = mysql_insert_id();


    $root_id = findRootCategoryId($lastInsId);

    if($parent_category_id > 0){
        $sql = "UPDATE nested_category SET count_sub = count_sub+1 where id=\"".$root_id."\"";
        $db->query($sql);
    }else{
        $sql = "UPDATE nested_category SET root_id = \"".$lastInsId."\" where id=\"".$lastInsId."\"";
        $db->query($sql);
    }

    rebuildCategoryTree($root_id, 1,$lastInsId);

    $sql = "UPDATE tbl_categories SET status = '0',tdate=\"".date("Y-m-d h:i:s")."\" where id=\"".$rec_id."\"";
    $db->query($sql);


}

function getAlias($catTitle){

    $alias = preg_replace("/(\s){2,}/",'$1',$catTitle);                 
    $alias = str_replace('-', '_', $alias);         
    $alias = preg_replace(array('/\s+/','/[^A-Za-z0-9\-]/'), array('-','-'), $alias);           
    $page_alias = trim(strtolower($alias));

    return $page_alias;
}

function findRootCategoryId($category_id) {
            global $db,$dbSelect;
            $category_id = str_replace("'","",$category_id);
            while($category_id != 0) {
                $sql = "SELECT category_id, id FROM nested_category WHERE id = $category_id";               
                $row = select($sql);
                $category_id = $row[0]["category_id"];
                $root_category_id = $row[0]["id"];
            }
            return $root_category_id;
}


function rebuildCategoryTree($category_id, $node_left,$thisId) {
            global $db,$dbSelect;

            if (($category_id > 0) or ($thisId > 0)) {

                $category_id = ($category_id>0)?$category_id:$thisId;
                $node_left = ($node_left>0)?$node_left:1;
                $root_category_id = findRootCategoryId($category_id);

                $sql = 'UPDATE nested_category SET root_id = '.$root_category_id.' WHERE id='.$category_id;
                $db->query($sql);

                $node_right = $node_left+1;

                $sql = 'SELECT id FROM nested_category WHERE category_id= '.$category_id;
                $resParentCat = select($sql);

                for ($i=0;$i<count($resParentCat);$i++) {           
                    $node_right = rebuildCategoryTree($resParentCat[$i]['id'], $node_right,$thisId);
                }

                $sql = 'UPDATE nested_category SET `left` = '.$node_left.', `right` = '.$node_right.', root_id = '.$root_category_id.' WHERE  id = '.$category_id;
                $db->query($sql);



                return $node_right+1;

            }

        }
?>

我的问题是,有没有什么方法可以加快数据插入,因为它通过计算左右数据来花费太多时间来插入数据。

我知道我第一次发布了这个问题,我需要你们给我们所有细节。请让我知道您宝贵的意见。

先谢谢。

1 个答案:

答案 0 :(得分:-1)

不要让任何人拽你的链子。对于相对稳定的数据,嵌套集远远优于邻接列表。这是我用来将80万行AL转换为NS的脚本。它通常在6-8分钟内运行。它最初是用T-SQL编写的,我已做了一些更改,使其(或多或少)与您的架构一起工作(如您所示)。它需要在MySql中运行才能工作。在任何情况下,都不需要功能或存储过程。您需要的一切都可以直接从输入表中获取。

我对您的设计进行了两处更改。首先,您的示例数据看起来具有非常好的识别值,因此我使用它而不是Identity字段。其次,我添加了SetID列来维护每个集合。所以完成的数据看起来像这样:

id      SetID  Name        lft rgt
381773  Arts   Arts        0   11
423945  Arts   Animation   1   6
425085  Arts   Anime       2   5
425147  Arts   Character   3   4
132456  Arts   2D          7   10
123457  Arts   Painting    8   9
381773  Crafts Crafts      0   7
423945  Crafts Clay        1   6
425085  Crafts Structures  2   3
425147  Crafts Pottery     4   5

整个表格不应该定义一棵树而是一棵树林(我把它们称为#34;设置&#34;与#34;嵌套Set&#34;命名法一致),边界编号是唯一的每一套。每组中的每个子组都具有用于分组的设置标识符。边界编号在每个集合的开头重新开始,因此每个集合根的左边界值将为零。

代码如下。请记住,这比最终代码更像是模板。提问。

CREATE TABLE IF NOT EXISTS nested_category(
    ID      INT not null PRIMARY KEY,
    SetID   varchar( 20 ) not null,
    Name    VARCHAR( 20 ) NOT NULL,
    lft     INT NOT NULL,
    rgt     INT NOT NULL
);

-- This was originally written in T-SQL so some syntax modification will
-- be in order. Sorry.
declare
    @CatIDNew       int,
    @CatIDPrev      int = 0,
    @rgt            int,
    @SetID          varchar( 20 ),
    @NameNew        varchar( 20 ),
    @LevelNew       int,
    @LevelPrev      int = 0;

declare
    @Defs table(
        Lev     int,
        ID      int,
        Name    varchar( 20 ),
        lft     int
    );

declare SS cursor fast_forward for
    select  Category_ID, Category_Name, Length( Path ) - Length( Replace( Path, '/', '' )) as Category_Level
    from    tbl_categories
    --where   Category_Path like 'Top/Arts%' )  -- Uncomment to test a smaller sample
    order by Category_Path;

open ss;
fetch next from ss into @CatIDNew, @NameNew, @LevelNew;

-- Go ahead and store the first record of the table
insert into @Defs values( @LevelNew, @CatIDNew, @NameNew, 0 );
select  @LevelPrev  = @LevelNew,
        @CatIDPrev  = @CatIDNew,
        @SetID      = @NameNew,
        @rgt        = 1;
-- Get the second row
fetch next from ss into @CatIDNew, @NameNew, @LevelNew;

-- Now we enter the row with the first entry in the table variable. We're using this table as a stack.
-- As we read entries from the cursor, we push them "onto" the stack until we read an entry the same
-- or smaller level than the previous entry.
while @@FETCH_STATUS = 0 begin
        if @LevelNew = 1 begin
            -- Handle the special case when transitioning between sets
            insert into nested_category( ID, SetID, Name, lft, rgt )
            select  ID, @SetID, Name, lft, @rgt +( @LevelPrev - Lev )
            from    @Defs
            where   Lev <= @LevelPrev
            order by Lev desc;

            -- Clear the stack
            delete @Defs;
            insert @Defs values( @LevelNew, @CatIDNew, @NameNew, 0 );

            select  @LevelPrev  = @LevelNew,
                    @CatIDPrev  = @CatIDNew,
                    @SetID      = @NameNew,
                    @rgt        = 1;
        end
        else if @LevelNew > @LevelPrev begin
            -- This new standard is the next level down from the last one. Save the last one.
            Merge into @Defs d
                using (select @LevelNew) as Search( Lev )
                    on d.Lev = Search.Lev
                when matched then
                    update  set ID      = @CatIDNew,
                                Name    = @NameNew,
                                lft     = @rgt
                when not matched then
                    insert values( @LevelNew, @CatIDNew, @NameNew, @rgt );
            select  @LevelPrev  = @LevelNew,
                    @rgt     = @rgt + 1;
        end
        else begin
            -- This is on the same or higher level as the last one. We can write out the last one to the new level.
            insert into nested_category( ID, SetID, Name, lft, rgt )
            select  ID, @SetID, Name, lft, @rgt +( @LevelPrev - Lev )
            from    @Defs
            where   Lev between @LevelNew and @LevelPrev;
            -- Then insert/replace the new one (in case the next one is lower)
            select  @LevelPrev  = @LevelNew,
                    @CatIDPrev  = @CatIDNew,
                    @rgt = @rgt + @@ROWCOUNT;
            Merge into @Defs d
                using (select @LevelNew) as Search( Lev )
                    on d.Lev = Search.Lev
                when matched then
                    update  set ID      = @NameNew,
                                Name    = @NameNew,
                                lft     = @rgt
                when not matched then
                    insert values( @LevelNew, @CatIDNew, @NameNew, @rgt );
            set @rgt = @rgt + 1;
        end;

    -- Read in the next row
    fetch next from ss into @CatIDNew, @NameNew, @LevelNew;
end;

-- There are no more entries. Purge any remaining in the stack.
insert into nested_category( ID, SetID, Name, lft, rgt )
select  ID, @SetID, Name, lft, @rgt +( @LevelPrev - Lev )
from    @Defs
where   Lev <= @LevelPrev;

close ss;
deallocate ss;

-- Saving these definitions until after the table is loaded saves a lot of time.
alter table nested_category add
    constraint  PK_nested_category primary key clustered( ID ),
    constraint  IX_nested_category_lft unique( SetID, lft ),
    constraint  CK_nested_categoryBounds check( lft < rgt );
alter table nested_category add
    constraint  FK_nested_categoryOrgID foreign key( SetID )references StandardOrgs( SetID ),
    constraint  FK_nested_categoryGUID foreign key( ID )references StandardGuidXref ( EDID ),
    constraint  FK_nested_categorySubject foreign key( SubjectID )references StandardSubjects ( ID ),
    constraint  FK_nested_categoryLevel foreign key( SetID, LevelID )references StandardLevels ( SetID, LevelID );
create index    IX_nested_categoryBounds on nested_category ( SetID, lft, rgt ); -- oft-used covering index