我们在AWS中有一个自动缩放组,在启动时数据库连接超时。这是在Ubuntu 16.04.3 LTS上使用PHP 7.0.22-0ubuntu0.16.04.1的Zend Framework 1应用程序
代码被烘焙到AMI中,但在启动期间,用户数据脚本将从git和phing配置应用程序。数据库域在几年内没有变化,配置主要是为了处理要使用的弹性缓存实例。换句话说,烘焙代码已经配置了数据库,并且在配置步骤中只会被相同的值覆盖。
一旦ec2实例在ELB中,它就开始被/health-check
点击,以查看负载均衡器是否正常。在此控制器内部是以下代码:
public function healthCheckAction() {
try {
/* @var $DBConn Zend_Db_Adapter_Pdo_Mysql */
$DBConn = Zend_Registry::get('multidb')->getDb();
// test guide service (most likely will be from memcache, unlikely to hit db)
$guideService = $this->_apiGuideService();
$guideService->isLoaded();
// this line fails and throws an exception
// I put host in here just so an error would include it in throw during this phase instead of catch phase (where it works)
// test raw db connection
$dbh = new PDO("mysql:host={$DBConn->getConfig()['host']};dbname={$DBConn->getConfig()['dbname']}", $DBConn->getConfig()['username'], $DBConn->getConfig()['password']);
$data = $dbh->query("SELECT '{$DBConn->getConfig()['host']}' as host, now()")->fetchObject();
// test database connectivity
// I put host in here just so an error would include it in throw during this phase instead of catch phase (where it works)
$sql = "SELECT '{$DBConn->getConfig()['host']}' as host, now()";
$DBConn->fetchRow($sql);
// test cache
/* @var $cache Zend_Cache_Core */
$cache = Zend_Registry::get('cachemanager')->getCache('default');
if (!$cache->load('health_check')) {
$cache->save(true, 'health_check');
}
echo 'Instance is healthy';
}
catch (Exception $e) {
header('HTTP/1.1 500 Internal Server Error');
echo 'Instance is unhealthy';
// get instance id
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://169.254.169.254/latest/meta-data/public-ipv4');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// get instance ip
$ip = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, 'http://169.254.169.254/latest/meta-data/instance-id');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$instance = curl_exec($ch);
// email us some info
$message = "Instance $instance failed health check. ssh ubuntu@$ip to investigate<br><br>" . $e->getLine() . " " . $e->getCode() . "<br>" . $e->getMessage() . "<br>" . $e->getTraceAsString(). "<br><br>";
ob_start();
// this works and returns access denied, not timeout
$this->runCommand('mysql -u examplecom_platform -h sg-rds-example.us-east-1.rds.amazonaws.com');
echo "testing DB with php<br>";
try {
echo "write host: " . $DBConn->getConfig()['host'] . "<br>";
echo "read host: " . $DBConn->getConfig()['host'] . "<br>";
$dbh = new PDO("mysql:host={$DBConn->getConfig()['host']};dbname={$DBConn->getConfig()['dbname']}", $DBConn->getConfig()['username'], $DBConn->getConfig()['password']);
$data = $dbh->query('select now()')->fetchObject();
echo "query database without zend:<br>";
print_r($data);
// this line works and prints out
// stdClass Object
// (
// [now()] => 2018-01-09 14:47:12
// )
$dbh = null;
} catch (PDOException $e) {
print "Error: " . $e->getMessage() . "<br/>";
}
// this all work/show correct IP
$this->runCommand('nc -vz sg-rds-example.us-east-1.rds.amazonaws.com 3306');
$this->runCommand('host sg-rds-example.us-east-1.rds.amazonaws.com');
$this->runCommand('dig sg-rds-example.us-east-1.rds.amazonaws.com');
$debug = ob_get_contents();
ob_end_clean();
$message .= "<br><br>" . str_replace("\n", "<br>", $debug);
$mail = new Zend_Mail();
$mail->setSubject('[examplecom] Instance Failed Healthcheck v2')
->setFrom('noreply@example.com')
->addTo('alerts@example.com')
->setBodyHtml($message)
->send();
}
}
当我不断调试时,我会添加越来越多的东西来测试连接
try
语句引发错误SQLSTATE[HY000] [2002] Connection timed out
但是这个完全相同的连接在catch
中起作用,并且能够从数据库中查询now()
。
这是我难倒的地方,第一次连接时相同的进程如何超时但在错误捕获期间工作?
此外,我只会收到其中1或2封电子邮件,说它无法连接,但最终我可以登录测试它正在运行的一些事情并且连接正常。健康检查报告很高兴,实例保存在ELB中。
添加更多调试的任何想法或建议?