pg_basebackup完成后,Postgresql 9.3复制无法启动

时间:2014-09-11 22:27:26

标签: postgresql database-replication postgresql-9.3

我正在尝试创建hot_standby服务器,并在pg_basebackup完成后收到以下错误。注意我使用shell脚本replicator.sh来启动复制。谁能给我一些见解?

我的规格:

  • Debian Wheezy 7.6
  • Postgresql 9.3
  • 数据库大小:~115GB

错误:

postgres@database-master:/etc/postgresql/9.3/main$ sh replicator.sh
Stopping PostgreSQL
[ ok ] Stopping PostgreSQL 9.3 database server: main.
Cleaning up old cluster directory
Starting base backup as replicator
Password:

113720266/113720266 kB (100%), 1/1 tablespace

NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
pg_basebackup: base backup completed
Starting Postgresql
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start.   
Please check the log output: 2014-09-11 17:56:33 UTC LOG: database system was interrupted; last 
known up at 2014-09-11 16:54:29 UTC 2014-09-11 17:56:33 UTC LOG: creating missing WAL directory 
"pg_xlog/archive_status" 2014-09-11 17:56:33 UTC LOG: incomplete startup packet 2014-09-11 17:56:33 
UTC LOG: invalid checkpoint record 2014-09-11 17:56:33 UTC FATAL: could not locate required 
checkpoint record 2014-09-11 17:56:33 UTC HINT: If you are not restoring from a backup, try 
removing the file "/var/lib/p[FAILesql/9.3/main/backup_label". 2014-09-11 17:56:33 UTC LOG: startup 
process (PID 21972) exited with exit code 1 2014-09-11 17:56:33 UTC LOG: aborting startup due to 
startup process failure ... failed! failed!

replicator.sh的内容:

#!/bin/bash

echo Stopping PostgreSQL
/etc/init.d/postgresql stop

echo Cleaning up old cluster directory
rm -rf /var/lib/postgresql/9.3/main

echo Starting base backup as replicator
pg_basebackup -h 123.456.789.123 -D /var/lib/postgresql/9.3/main -U replicator -v -P

echo Writing recovery.conf file
sudo -u postgres bash -c "cat > /var/lib/postgresql/9.3/main/recovery.conf <<- _EOF1_
  standby_mode = 'on'
  primary_conninfo = 'host=123.456.789.123 port=5432  user=replicator password=XXXXX sslmode=require'
  trigger_file = '/tmp/postgresql.trigger'
_EOF1_
"

echo Starting Postgresql
/etc/init.d/postgresql start

谢谢你, 杰克

1 个答案:

答案 0 :(得分:4)

我从上面得出的最好的猜测是pg_basebackup失败,你的shell脚本没有检查错误返回码或使用set -e在错误后自动中止,所以它只是继续进行。

您也可能没有配置WAL归档,或者副本中没有设置restore_command。在这种情况下,启动基本备份所需的事务日志将不可用,启动将失败。

我强烈建议你:

  • 使用pg_basebackup -X stream以便将所需的事务日志与备份一起复制;和

  • 在您的shell脚本中使用set -e,或使用合适的if ! pg_basebackup .... ; then块测试错误。