我有一个存储过程(下面提供),我写了它来处理大约2500万条记录。这个存储过程正在做的是采用给定的lat& lon,距离(25英里),以及要分配的记录数(12),根据25英里查找给定边界内的所有记录,并为尚未记录的用户分配最多12条记录。并且,用户每个类别只能有一个记录(因此12个记录各有一个不同的类别)。
存储过程很有效。唯一的问题是需要很长时间。我创建了8个总触发器,每个触发器除了工作台(POSTSINAREATBL [1-8])之外都是相同的,所以我可以加快这个过程。我现在已经运行了4天的脚本,并且只处理了2500万条记录中的350万条。
我希望有人可能会有一些见解并帮助我们加快速度。我真的需要在接下来的1-2天内处理完所有的记录,而且按现在的速度处理,这需要花费近一个月的时间!
此外,在运行8个脚本的情况下,我的CPU运行率为99.8%,因此我的容量最大。
DELIMITER $$
CREATE PROCEDURE `get_pins_in_boundaries`(IN mylon double, IN mylat double, IN dist int, IN numrecords int)
BEGIN
declare isDone INT;
declare lat float;
declare lng float;
declare lon1 float;
declare lon2 float;
declare lat1 float;
declare lat2 float;
declare this_iter_pin_id int;
declare use_this_user_id int;
DECLARE num_results_in_area int;
DECLARE cur_posts_to_assign_to_user CURSOR FOR select pin_id from POSTSINAREATBL group by category_id limit numrecords;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET isDone = 1;
IF mylon = 0.000000 OR mylat = 0.000000 THEN
SELECT CONCAT('complete') AS results;
ELSE
SET lat=mylon;
SET lng=mylat;
-- calculate lon and lat for the rectangle:
set lon1 = mylon-dist/abs(cos(radians(mylat))*69);
set lon2 = mylon+dist/abs(cos(radians(mylat))*69);
set lat1 = mylat-(dist/69); set lat2 = mylat+(dist/69);
-- calculate lon and lat for the rectangle:
set lon1 = lng - dist / ABS(COS(RADIANS(lat)) * 111.04);
set lon2 = lng + dist / ABS(COS(RADIANS(lat)) * 111.04);
set lat1 = lat - dist / (111.04);
set lat2 = lat + dist / (111.04);
-- create temp table and store records matching criteria into table
CREATE TABLE IF NOT EXISTS POSTSINAREATBL(
pin_id BIGINT NOT NULL,
category_id BIGINT NOT NULL,
distance DECIMAL(6,1)
);
INSERT INTO POSTSINAREATBL (
SELECT pin_id,category_id, ( 3959 * acos( cos( radians(lat) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians(lng) ) + sin( radians(lat) ) * sin(radians(latitude)) ) ) as distance
FROM skoovy_prd.pins
WHERE longitude between lon1 and lon2
and latitude between lat1 and lat2
and user_id =0
);
select count(*) INTO num_results_in_area from POSTSINAREATBL;
WHILE num_results_in_area > 0 DO
SET use_this_user_id = (SELECT user_id from skoovy_prd.users WHERE user_id NOT IN(select user_id from skoovy_prd.posts_users_processed) LIMIT 1);
INSERT INTO skoovy_prd.posts_users_processed (user_id) VALUES(use_this_user_id);
SET isDone = 0;
OPEN cur_posts_to_assign_to_user;
REPEAT
FETCH cur_posts_to_assign_to_user INTO this_iter_pin_id;
UPDATE skoovy_prd.pins SET pins.user_id = use_this_user_id WHERE pins.pin_id = this_iter_pin_id;
DELETE FROM POSTSINAREATBL WHERE pin_id = this_iter_pin_id;
SET num_results_in_area = num_results_in_area - 1;
UNTIL isDone END REPEAT;
CLOSE cur_posts_to_assign_to_user;
END WHILE;
TRUNCATE TABLE POSTSINAREATBL;
SELECT CONCAT('complete') AS results;
END IF;
END
答案 0 :(得分:0)
您是否尝试执行 spartial index ?我没有在mysql上尝试过,但对于postgresql,有一个 postgis扩展,它为这类任务提供了全套有用的功能。它真的帮助我完成相同的任务。 http://postgis.net/
选中此http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html