我在过去几年创建的数据库中有一堆产品(500k左右),我想将它们组合在一起(Rails 2.3.14)
理想情况下,如果符合以下条件,它们将被视为同一组:
我正在努力完成的事情:
def self.package_products
Company.each do |company|
package = Package.new
products = Product.find(:all, :conditions => [:company_id = company && created_around_similar_times])
package.contents = first_few_product_descriptions
package.save!
products.update_all(:package_id => package.id)
end
end
对我而言,它闻起来很糟糕。我不喜欢在公司中循环,不禁认为有更好的方法。有没有人有任何可以分组相似项目的sql-fu?基本上是要查找彼此在10分钟内创建的同一公司的产品,并为它们分配相同的package_id。
答案 0 :(得分:2)
这在纯SQL中很难实现。我会采用plpgsql程序
说,你的桌子看起来像这样:
(下次,发表一个表定义真是太好了。值得一千多个字。)
create table p (
id serial primary key -- or whatever your primary key is!
, company_id int4 NOT NULL
, create_time timestamp NOT NULL
, for_sale bool NOT NULL
);
使用像这样的plpgsql函数:
CREATE OR REPLACE FUNCTION f_p_group()
RETURNS void AS
$BODY$
DECLARE
g_id integer := 1;
last_time timestamp;
last_company_id integer;
r p%ROWTYPE;
BEGIN
-- If the table is huge, special settings for these parameters will help
SET temp_buffers = '100MB'; -- more RAM for temp table, adjust to actual size of p
SET work_mem = '100MB'; -- more RAM for sorting
-- create temp table just like original.
CREATE TEMP TABLE tmp_p ON COMMIT DROP AS
SELECT * FROM p LIMIT 0; -- no rows yet
-- add group_id.
ALTER TABLE tmp_p ADD column group_id integer;
-- loop through table, write row + group_id to temp table
FOR r IN
SELECT * -- get the whole row!
FROM p
-- WHERE for_sale -- commented out, after it vanished from the question
ORDER BY company_id, create_time -- group by company_id first, there could be several groups intertwined
LOOP
IF r.company_id <> last_company_id OR (r.create_time - last_time) > interval '10 min' THEN
g_id := g_id + 1;
END IF;
INSERT INTO tmp_p SELECT r.*, g_id;
last_time := r.create_time;
last_company_id := r.company_id;
END LOOP;
TRUNCATE p;
ALTER TABLE p ADD column group_id integer; -- add group_id now
INSERT INTO p
SELECT * FROM tmp_p; -- ORDER BY something?
ANALYZE p; -- table has been rewritten, no VACUUM is needed.
END;
$BODY$
LANGUAGE plpgsql;
拨打一次,然后丢弃:
SELECT f_p_group();
DROP FUNCTION f_p_group();
现在,根据您的定义,群组中的所有成员共享group_id
。
我还做了几件事:
for_sale
在查询中不再出现问题后被忽略。