MySQL存储过程 - 处理数百万条记录 - 如何加速

时间:2014-09-26 01:58:43

标签: mysql sql stored-procedures database-performance

我有一个存储过程(下面提供),我写了它来处理大约2500万条记录。这个存储过程正在做的是采用给定的lat& lon,距离(25英里),以及要分配的记录数(12),根据25英里查找给定边界内的所有记录,并为尚未记录的用户分配最多12条记录。并且,用户每个类别只能有一个记录(因此12个记录各有一个不同的类别)。

存储过程很有效。唯一的问题是需要很长时间。我创建了8个总触发器,每个触发器除了工作台(POSTSINAREATBL [1-8])之外都是相同的,所以我可以加快这个过程。我现在已经运行了4天的脚本,并且只处理了2500万条记录中的350万条。

我希望有人可能会有一些见解并帮助我们加快速度。我真的需要在接下来的1-2天内处理完所有的记录,而且按现在的速度处理,这需要花费近一个月的时间!

此外,在运行8个脚本的情况下,我的CPU运行率为99.8%,因此我的容量最大。

DELIMITER $$

CREATE PROCEDURE `get_pins_in_boundaries`(IN mylon double, IN mylat double, IN dist int, IN numrecords int)
BEGIN
    declare isDone INT;

    declare lat float;
    declare lng float;

    declare lon1 float;
    declare lon2 float; 
    declare lat1 float;
    declare lat2 float;

    declare this_iter_pin_id int;
    declare  use_this_user_id int;

    DECLARE num_results_in_area int;

    DECLARE cur_posts_to_assign_to_user CURSOR FOR select pin_id from POSTSINAREATBL group by category_id limit numrecords;

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET isDone = 1;

    IF mylon = 0.000000 OR mylat = 0.000000 THEN
        SELECT CONCAT('complete') AS results;
    ELSE

        SET lat=mylon;
        SET lng=mylat;

        -- calculate lon and lat for the rectangle:
        set lon1 = mylon-dist/abs(cos(radians(mylat))*69); 
        set lon2 = mylon+dist/abs(cos(radians(mylat))*69); 
        set lat1 = mylat-(dist/69);  set lat2 = mylat+(dist/69);

        -- calculate lon and lat for the rectangle:
        set lon1 = lng - dist / ABS(COS(RADIANS(lat)) * 111.04);
        set lon2 = lng + dist / ABS(COS(RADIANS(lat)) * 111.04);
        set lat1 = lat - dist / (111.04);
        set lat2 = lat + dist / (111.04);

        -- create temp table and store records matching criteria into table
        CREATE TABLE IF NOT EXISTS POSTSINAREATBL(
            pin_id BIGINT NOT NULL,
            category_id BIGINT NOT NULL,
            distance DECIMAL(6,1)
        );

        INSERT INTO POSTSINAREATBL (
            SELECT pin_id,category_id, ( 3959 * acos( cos( radians(lat) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians(lng) ) + sin( radians(lat) ) * sin(radians(latitude)) ) ) as distance
            FROM skoovy_prd.pins
            WHERE longitude between lon1 and lon2 
            and latitude between lat1 and lat2 
            and user_id =0
        );

        select count(*) INTO num_results_in_area from POSTSINAREATBL;

        WHILE num_results_in_area > 0 DO

            SET use_this_user_id = (SELECT user_id from skoovy_prd.users WHERE user_id NOT IN(select user_id from skoovy_prd.posts_users_processed) LIMIT 1);

            INSERT INTO skoovy_prd.posts_users_processed (user_id) VALUES(use_this_user_id);

            SET isDone = 0;
            OPEN cur_posts_to_assign_to_user;
            REPEAT
                FETCH cur_posts_to_assign_to_user INTO this_iter_pin_id;

                UPDATE skoovy_prd.pins SET pins.user_id = use_this_user_id WHERE pins.pin_id = this_iter_pin_id;

                DELETE FROM POSTSINAREATBL WHERE pin_id = this_iter_pin_id;

                SET num_results_in_area = num_results_in_area - 1;

            UNTIL isDone END REPEAT;

            CLOSE cur_posts_to_assign_to_user;

        END WHILE;

        TRUNCATE TABLE POSTSINAREATBL;

        SELECT CONCAT('complete') AS results;

    END IF;

END

1 个答案:

答案 0 :(得分:0)

您是否尝试执行 spartial index ?我没有在mysql上尝试过,但对于postgresql,有一个 postgis扩展,它为这类任务提供了全套有用的功能。它真的帮助我完成相同的任务。 http://postgis.net/

选中此http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html