Python GAE - 如何以编程方式将数据从备份导出到Big Query?

时间:2016-05-18 19:34:36

标签: python google-app-engine google-bigquery

我已经谷歌搜索了很长时间,而且我没有找到一种方法将我的备份(在存储桶中)导出到Big Query而无需手动执行...

是否可以这样做?

非常感谢!

1 个答案:

答案 0 :(得分:0)

您应该可以通过python-bigquery api

这样做

首先,您需要建立与BigQuery服务的连接。这是我用来执行此操作的代码:

class BigqueryAdapter(object):
    def __init__(self, **kwargs):
        self._project_id = kwargs['project_id']
        self._key_filename = kwargs['key_filename']
        self._account_email = kwargs['account_email']
        self._dataset_id = kwargs['dataset_id']
        self.connector = None
        self.start_connection()

    def start_connection(self):
        key = None
        with open(self._key_filename) as key_file:
            key = key_file.read()
        credentials = SignedJwtAssertionCredentials(self._account_email,
                                                    key,
                                                    ('https://www.googleapis' +
                                                     '.com/auth/bigquery'))
        authorization = credentials.authorize(httplib2.Http())
        self.connector = build('bigquery', 'v2', http=authorization)

之后,您可以使用jobs运行self.connectorin this answer您会找到一些示例)。

要从Google云端存储中获取备份,您必须定义configuration,如下所示:

body = "configuration": {
  "load": {
    "sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO".
    "fieldDelimiter": "," #(if it's comma separated)
    "destinationTable": {
      "projectId": #your_project_id
      "tableId": #your_table_to_save_the_data
      "datasetId": #your_dataset_id
    },
    "writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND"
    "sourceUris": [
        #the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator.
    ],
    "schema": { # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
      "fields": [ # Describes the fields in a table.
        {
          "fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD.
            # Object with schema name: TableFieldSchema
          ],
          "type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema).
          "description": "A String", # [Optional] The field description. The maximum length is 16K characters.
          "name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters.
          "mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE.
        },
      ],
    },
  },

然后运行:

self.connector.jobs().insert(body=body).execute()

希望这就是你要找的东西。如果您遇到任何问题,请告诉我们。