我们正在计划一个大型Greenplum DB(在前18个月内从10T增长到100TB)。传统的备份和恢复工具无济于事,因为我们有24小时的RPO / RTO需要处理。 有没有办法将数据库复制到我们的DR站点而无需求助于块复制(即在SAN和镜像上放置一个段)?
答案 0 :(得分:1)
您有多种选择可供选择:
目前Greenplum没有内置的WAN复制解决方案,因此这几乎是所有可供选择的选项。
答案 1 :(得分:0)
我对此做了一些调查。这是我的结果
I. Using EMC Symmetrix VMAX SAN(Storage Area Network) Mirror and SRDF (Symmetrix Remote Data Facility) remote replication software
Please refer to h12079-vnx-replication-technologies-overview-wp.pdf for details
Preconditions
1. Having EMC Symmetrix VMAX SAN installed
2. Having SRDF softeware
Advantages of 3 different modes
1. Symmetrix Remote Data Facility / Synchronous (SRDF/S)
Provides a no data loss solution (Zero RPO).
No server resource contention for remote mirroring operation.
Can perform restoration of primary site with minimal impact to application. Performance on remote site. Enterprise
disaster recovery solution. Supports replicating over IP and Fibre
Channel protocols.
2. Symmetrix Remote Data Facility / Asynchronous (SRDF/A) Extended-distance data replication that supports longer distances
than SRDF/S. SRDF/A does not affect host performance, because host
activity is decoupled from the remote copy process. Efficient link
utilization that results in lower link-bandwidth requirements.
Facilities to invoke failover and restore operations. Supports
replicating over IP and Fibre Channel protocols.
3. Symmetrix Remote Data Facility / Data Mobility (SRDF/DM)
II. Using Backup Tools
Please refer to http://gpdb.docs.pivotal.io/4350/admin_guide/managing/backup.html for details
Parallel Backup
Parallel backup utility gpcrondump
Non-parallel backup
It is not recommended. It is used for migrate PostgreSQL databases to GreenPlum databases
Parallel Restore
Support system with the same configuration and different configuration with the source GreenPlum database configuration
Non-Parallel Restore
pg_restore requires to modified the create statement to add distributed by clause
Disadvantages
1. The backup process locks table, it put an EXCLUSIVE lock on table pg_class. It means that read permission is only allowed in this period.
2. After releasing the EXCLUSIVE lock on table pg_clas, it will put an ACCESS SHARE lock on all the tables, it only allows read access during the lock period.
III. Replay DDL statements
In PostgreSQL, there is a parameters to log all the sql statements to a file.
In the data/postgresql.conf, modify log_statement to ‘all’
Write an application to get the DML and DDL statement, and run them in the DR servers.
Advantage
1. Easy to configure and maintain
2. No decrease in the performance
Disadvantage
1. Need additional storage for the statement logging
IV. Parse the wal log of PostgreSQL
Parse the wal log to extract the DDL statement from the log and then run all the generated DDL statements in the DR GreenPlum
Advantage
1. Doesn’t impact the source GreenPlum Database
Disadvantage
1. Write code to parse the wal log
2. Not easy to parse the log, there are not enough documents about the wal log.
3. Don’t know if it is feasible for GreenPlum, as it is one solution for PostgreSQL.