从python字典自动生成BigQuery模式

时间:2019-05-10 14:43:55

标签: python google-bigquery

如何从python字典自动生成BigQuery表架构?

例如

dict = {'data': 'some_data', 'me': 8}
schema = BigQuery.generateSchema(dict)

#schema is now:
# {'fields': [
#    {'name': 'data', 'type': 'STRING', 'mode': 'NULLABLE'},
#    {'name': 'me', 'type': 'INT', 'mode': 'NULLABLE'}
# ]}

这样的东西存在吗?

1 个答案:

答案 0 :(得分:1)

目前,BigQuery Python库中尚无当前方法。

这里是一个递归函数来实现它。

import datetime
from google.cloud.bigquery.schema import SchemaField

# [START] map_dict_to_bq_schema

# FieldType Map Dictionary
field_type = {
        str: 'STRING',
        bytes: 'BYTES',
        int: 'INTEGER',
        float: 'FLOAT',
        bool: 'BOOLEAN',
        datetime.datetime: 'DATETIME',
        datetime.date: 'DATE',
        datetime.time: 'TIME',
        dict: 'RECORD',
}


# Function to take a dictionary
# and return a bigquery schema
def map_dict_to_bq_schema(source_dict):

    # SchemaField list
    schema = []

    # Iterate the existing dictionary
    for key, value in source_dict.items():

        try:
            schemaField = SchemaField(key, field_type[type(value)]) # NULLABLE BY DEFAULT
        except KeyError:

            # We are expecting a REPEATED field
            if value and len(value) > 0:
                schemaField = SchemaField(key, field_type[type(value[0])], mode='REPEATED') # REPEATED

        # Add the field to the list of fields
        schema.append(schemaField)

        # If it is a STRUCT / RECORD field we start the recursion
        if schemaField.field_type == 'RECORD':

            schemaField._fields = map_dict_to_bq_schema(value)

    # Return the dictionary values
    return schema

# [END] map_dict_to_bq_schema

示例:



>>> map_dict_to_bq_schema({'data': 'some_data', 'me': 8})
# Output
>>> [SchemaField('data', 'STRING', 'NULLABLE', None, ()), SchemaField('me', 'INTEGER', 'NULLABLE', None, ())]


>>> map_dict_to_bq_schema({'data': {'data2': 'some_data', 'me2': 8}, 'me': 8, 'h':[5,6,7]})
# Output
>>> [SchemaField('h', 'INTEGER', 'REPEATED', None, ()), SchemaField('me', 'INTEGER', 'NULLABLE', None, ()), SchemaField('data', 'RECORD', 'NULLABLE', None, [SchemaField('data2', 'STRING', 'NULLABLE', None, ()), SchemaField('me2', 'INTEGER', 'NULLABLE', None, ())])]

在这个问题中,我使用@luckylwk的代码作为参考:How to map a Python Dict to a Big Query Schema,专门用于nested and repeated列。

另外,从BQ python库中检查SchemaField类。从那里,您可以获取要在python客户端,CLI或与您的用例匹配的模式下使用架构的格式。