我已经谷歌搜索了很长时间,而且我没有找到一种方法将我的备份(在存储桶中)导出到Big Query而无需手动执行...
是否可以这样做?
非常感谢!
答案 0 :(得分:0)
您应该可以通过python-bigquery api。
这样做首先,您需要建立与BigQuery服务的连接。这是我用来执行此操作的代码:
class BigqueryAdapter(object):
def __init__(self, **kwargs):
self._project_id = kwargs['project_id']
self._key_filename = kwargs['key_filename']
self._account_email = kwargs['account_email']
self._dataset_id = kwargs['dataset_id']
self.connector = None
self.start_connection()
def start_connection(self):
key = None
with open(self._key_filename) as key_file:
key = key_file.read()
credentials = SignedJwtAssertionCredentials(self._account_email,
key,
('https://www.googleapis' +
'.com/auth/bigquery'))
authorization = credentials.authorize(httplib2.Http())
self.connector = build('bigquery', 'v2', http=authorization)
之后,您可以使用jobs
运行self.connector
(in this answer您会找到一些示例)。
要从Google云端存储中获取备份,您必须定义configuration
,如下所示:
body = "configuration": {
"load": {
"sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO".
"fieldDelimiter": "," #(if it's comma separated)
"destinationTable": {
"projectId": #your_project_id
"tableId": #your_table_to_save_the_data
"datasetId": #your_dataset_id
},
"writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND"
"sourceUris": [
#the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator.
],
"schema": { # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
"fields": [ # Describes the fields in a table.
{
"fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD.
# Object with schema name: TableFieldSchema
],
"type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema).
"description": "A String", # [Optional] The field description. The maximum length is 16K characters.
"name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters.
"mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE.
},
],
},
},
然后运行:
self.connector.jobs().insert(body=body).execute()
希望这就是你要找的东西。如果您遇到任何问题,请告诉我们。