这是我第一次在mysql中创建存储过程。
它的作用是什么,
first- get count of all records
second- loop through that table 1 by 1
third- compare each entry if it is a duplicate
fourth- insert duplicate in a temporary table
last- display duplicates
它可以在100-200个条目上正常工作但是在大于500+(有时是25k)的更大记录上它会抛出一条消息
错误代码:1172。结果包含多行
我已经搜索了这个问题,但没有一个(答案)帮助我解决我的问题。
请查看我的剧本
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
DECLARE i_sku VARCHAR(255);
DECLARE i_concatenated_attributes MEDIUMTEXT;
DECLARE f_sku VARCHAR(255);
DECLARE f_offer_type VARCHAR(255);
DECLARE f_name VARCHAR(255);
DECLARE f_product_owner VARCHAR(255);
DECLARE f_listing_city VARCHAR(255);
DECLARE f_listing_area VARCHAR(255);
DECLARE f_price DOUBLE;
DECLARE f_bedrooms INT;
DECLARE f_building_size INT;
DECLARE f_land_size INT;
DECLARE f_concatenated_attributes MEDIUMTEXT;
DECLARE f_duplicate_percentage INT;
SELECT COUNT(*) FROM unit_temp_listing INTO n;
CREATE TEMPORARY TABLE IF NOT EXISTS temp_temp (dup_sku VARCHAR(255), dup_percentage INT, attribs MEDIUMTEXT);
SET i=0;
WHILE i<n DO
-- Get all unit listings (one by one)
SELECT
sku, concat_ws(',',offer_type,name,product_owner,listing_city,listing_area,price,ifnull(bedrooms,0),ifnull(building_size,0),ifnull(land_size,0)) as concatenated_attributes
INTO i_sku, i_concatenated_attributes
FROM unit_temp_listing
limit 1 offset i;
-- Compare one by one (sadla)
SELECT
f.sku, f.offer_type, f.name, f.product_owner, f.listing_city, f.listing_area, f.price, f.bedrooms, f.building_size, f.land_size,
levenshtein_ratio(concat_ws(',',f.offer_type,f.name,f.product_owner,f.listing_city,f.listing_area,f.price,ifnull(f.bedrooms,0),ifnull(f.building_size,0),ifnull(f.land_size,0)),i_concatenated_attributes) as f_duplicate_percentage,
concat_ws(',',f.offer_type,f.name,f.product_owner,f.listing_city,f.listing_area,f.price,ifnull(f.bedrooms,0),ifnull(f.building_size,0),ifnull(f.land_size,0)) as fconcatenated_attributes
INTO f_sku, f_offer_type, f_name, f_product_owner, f_listing_city, f_listing_area, f_price, f_bedrooms, f_building_size, f_land_size, f_duplicate_percentage, f_concatenated_attributes
FROM unit_temp_listing f
WHERE substring(soundex(concat_ws(',',offer_type,name,product_owner,listing_city,listing_area,price,ifnull(bedrooms,0),ifnull(building_size,0),ifnull(land_size,0))),1,10) = substring(soundex(i_concatenated_attributes),1,10)
AND levenshtein_ratio(concat_ws(',',offer_type,name,product_owner,listing_city,listing_area,price,ifnull(bedrooms,0),ifnull(building_size,0),ifnull(land_size,0)),i_concatenated_attributes) > 90
AND f.sku != i_sku;
-- INSERT duplicates
IF(f_sku IS NOT NULL) THEN
INSERT INTO temp_temp (dup_sku, dup_percentage, attribs) VALUES (f_sku, f_duplicate_percentage, f_concatenated_attributes);
SET f_sku = null;
SET f_duplicate_percentage = null;
SET f_concatenated_attributes = null;
END IF;
SET i = i + 1;
END WHILE;
SELECT * FROM temp_temp;
DROP TABLE temp_temp;
End
有什么问题?
答案 0 :(得分:0)
我的同事们欢迎你们我已经解决了我的问题,我想在这里分享。
问题是,SELECT
循环中的 SECOND WHILE
语句返回了多行。我尝试使用CURSOR
,但仍然收到错误消息。所以我试着把我的第二个SELECT
语句放在INSERT
语句中,就像这样
INSERT INTO table_name (columns, ...) SELECT_STATEMENT
然后,问题解决了!但如果这里的任何人有想法优化我的查询,请帮助我。我必须处理20k +记录,但由于执行时间太长,我只能在500分钟内完成15-20分钟。
感谢18个观点(在撰写本文时)。
快乐的编码!