Question

我们有一个带有Postgresql 9.1数据库的电子商务门户网站。一个非常重要的表目前有3200万条记录。如果我们想要提供所有项目，这个表将增长到3.2亿条记录，主要是日期。哪个会很重。

所以我们正在考虑水平分区/分片。我们可以将此表中的项目划分为12个横向（每月1个）。这样做的最佳步骤和技术是什么？数据库中的水平分区是否足够好，还是我们必须开始考虑分片？

Answer 1

虽然3.2亿不小，但它也不是很大。

这在很大程度上取决于您在桌面上运行的查询。如果您始终在查询中包含分区键，那么“常规”分区可能会起作用。

这方面的一个例子可以在PostgreSQL wiki中找到：
http://wiki.postgresql.org/wiki/Month_based_partitioning

本手册还解释了一些分区的注意事项：
http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html

如果您正在考虑分片，您可能会看到Instagram（由PostgreSQL提供支持）如何实现：

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

如果您主要具有读取查询，则另一个选项可能是使用流复制来设置多个服务器并通过连接到热备用以进行读取访问并连接到主服务器以进行写访问来分发读取查询。我认为pg-pool II可以（有些）自动完成。这可以与分区结合使用，进一步减少查询运行时间。

如果您喜欢冒险并且没有立即需要这样做，您可能还会考虑Postgres-XC，它承诺支持透明的水平缩放：
http://postgres-xc.sourceforge.net/

还没有最终版本，但看起来这不会花太长时间

Answer 2

以下是我的分区示例代码： t_master是一个在应用程序中选择/插入/更新/删除的视图 t_1和t_2是实际存储数据的基础表。

create or replace view t_master(id, col1)
as 
select id, col1 from t_1
union all
select id, col1 from t_2


CREATE TABLE t_1
(
  id bigint PRIMARY KEY,
  col1 text
);

CREATE TABLE t_2
(
  id bigint PRIMARY KEY,
  col1 text
);



CREATE OR REPLACE FUNCTION t_insert_partition_function()
returns TRIGGER AS $$
begin
raise notice '%s', 'hello';
    execute 'insert into t_'
        || ( mod(NEW.id, 2)+ 1 )
        || ' values ( $1, $2 )' USING NEW.id, NEW.col1 ;
    RETURN NULL;
end;
$$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION t_update_partition_function()
returns TRIGGER AS $$
begin
    raise notice '%s', 'hello';
    execute 'update t_'
        || ( mod(NEW.id, 2)+ 1 )
        || ' set id = $1, col1 = $2 where id = $1' 
        USING NEW.id, NEW.col1 ;
    RETURN NULL;
end;
$$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION t_delete_partition_function()
returns TRIGGER AS $$
begin
    raise notice '%s', 'hello';
    execute 'delete from t_'
        || ( mod(OLD.id, 2)+ 1 )
        || ' where id = $1' 
        USING OLD.id;
    RETURN NULL;
end;
$$
LANGUAGE plpgsql;



CREATE TRIGGER t_insert_partition_trigger instead of INSERT
ON t_master FOR each row 
execute procedure t_insert_partition_function();

CREATE TRIGGER t_update_partition_trigger instead of update
ON t_master FOR each row 
execute procedure t_update_partition_function();

CREATE TRIGGER t_delete_partition_trigger instead of delete
ON t_master FOR each row 
execute procedure t_delete_partition_function();

Answer 3

如果您不介意升级到PostgreSQL 9.4，那么您可以使用pg_shard extension，它允许您透明地在多台计算机上对PostgreSQL表进行分片。每个分片都作为常规PostgreSQL表存储在另一个PostgreSQL服务器上，并复制到其他服务器。它使用散列分区来决定将哪个分片用于给定查询。如果您的查询具有自然分区维度（例如，客户ID），则pg_shard可以正常工作。

更多信息：https://github.com/citusdata/pg_shard

Postgresql中水平分区的正确步骤是什么？

3 个答案: