尝试从Cloud SQL using a RESTful call to the Google API导出时,我们遇到了随机故障,大约每10次尝试一次。下面的日志摘录取自Stackdriver,并没有提供对根本原因的深入了解:
{
insertId: "55C4B416A7642.A1B5D8E.570939C6"
logName: "projects/my-dev-project/logs/cloudaudit.googleapis.com%2Fdata_access"
protoPayload: {
@type: "type.googleapis.com/google.cloud.audit.AuditLog"
authenticationInfo: {
principalEmail: "jsmith@mycompany.com"
}
authorizationInfo: [
0: {
authorizationLoggingOptions: {
permissionType: "DATA_READ"
}
granted: true
permission: "cloudsql.instances.export"
resource: "instances/my-db-instance"
}
]
methodName: "cloudsql.instances.export"
request: {
@type: "type.googleapis.com/cloudsql.admin.InstancesExportRequest"
exportContext: {
database: [
0: "business_db"
]
fileType: "CSV"
schemaOnly: true
selectQuery:
"SELECT
replace(IFNULL(`ID`,'<NULL_VALUE>'), '', '') AS `ID`,
replace(IFNULL(`NAME`,'<NULL_VALUE>'), '', '') AS `NAME`,
replace(IFNULL(`ADDRESS`,'<NULL_VALUE>'), '', '') AS `ADDRESS`,
replace(IFNULL(`DOB`,'<NULL_VALUE>'), '', '') AS `DOB`
FROM CUSTOMERS"
table: [
0: "CUSTOMERS"
]
uri: "gs://my-dev-project-stage/CUSTOMERS_full.csv"
}
instanceName: {
fullProjectId: "my-dev-project"
instanceId: "my-db-instance"
}
}
requestMetadata: {
callerIp: "107.178.192.158"
}
resourceName: "instances/my-db-instance"
serviceName: "cloudsql.googleapis.com"
status: {
code: 2
message: "UNKNOWN"
}
}
receiveTimestamp: "2017-10-24T13:52:53.738754270Z"
resource: {
labels: {
database_id: "my-dev-project:my-db-instance"
project_id: "my-dev-project"
region: "europe"
}
type: "cloudsql_database"
}
severity: "ERROR"
timestamp: "2017-10-24T13:52:53.317Z"
}
此日志条目与成功导出之间的唯一区别是帖子标题中提到的三个属性值。在成功的条目中,他们如下:
status: {
}
}
receiveTimestamp: "2017-10-24T14:21:44.997050352Z"
resource: {
labels: {
database_id: "my-dev-project:my-db-instance"
project_id: "my-dev-project"
region: "europe"
}
type: "cloudsql_database"
}
severity: "INFO"
timestamp: "2017-10-24T14:21:43.757Z"
}
间歇性表明根本原因是外部因素。即具有相同参数的完全相同的请求;连接详细信息,输出文件位置和SQL语句将在几分钟后工作。导出失败的一致性没有。
我们已经发现Cloud SQL不支持多个同时导出,我们已经实施了排队系统来避免此问题。
数据量似乎也不是影响因素。导出将适用于100,000行文件,然后失败500行文件。事实上,较小甚至空的数据集似乎更容易失败。
如果有人对调查解决方案或途径有任何想法或建议,我有兴趣听取他们的意见。感谢。