如何从python字典自动生成BigQuery表架构?
例如
dict = {'data': 'some_data', 'me': 8}
schema = BigQuery.generateSchema(dict)
#schema is now:
# {'fields': [
# {'name': 'data', 'type': 'STRING', 'mode': 'NULLABLE'},
# {'name': 'me', 'type': 'INT', 'mode': 'NULLABLE'}
# ]}
这样的东西存在吗?
答案 0 :(得分:1)
目前,BigQuery Python库中尚无当前方法。
这里是一个递归函数来实现它。
import datetime
from google.cloud.bigquery.schema import SchemaField
# [START] map_dict_to_bq_schema
# FieldType Map Dictionary
field_type = {
str: 'STRING',
bytes: 'BYTES',
int: 'INTEGER',
float: 'FLOAT',
bool: 'BOOLEAN',
datetime.datetime: 'DATETIME',
datetime.date: 'DATE',
datetime.time: 'TIME',
dict: 'RECORD',
}
# Function to take a dictionary
# and return a bigquery schema
def map_dict_to_bq_schema(source_dict):
# SchemaField list
schema = []
# Iterate the existing dictionary
for key, value in source_dict.items():
try:
schemaField = SchemaField(key, field_type[type(value)]) # NULLABLE BY DEFAULT
except KeyError:
# We are expecting a REPEATED field
if value and len(value) > 0:
schemaField = SchemaField(key, field_type[type(value[0])], mode='REPEATED') # REPEATED
# Add the field to the list of fields
schema.append(schemaField)
# If it is a STRUCT / RECORD field we start the recursion
if schemaField.field_type == 'RECORD':
schemaField._fields = map_dict_to_bq_schema(value)
# Return the dictionary values
return schema
# [END] map_dict_to_bq_schema
示例:
>>> map_dict_to_bq_schema({'data': 'some_data', 'me': 8})
# Output
>>> [SchemaField('data', 'STRING', 'NULLABLE', None, ()), SchemaField('me', 'INTEGER', 'NULLABLE', None, ())]
>>> map_dict_to_bq_schema({'data': {'data2': 'some_data', 'me2': 8}, 'me': 8, 'h':[5,6,7]})
# Output
>>> [SchemaField('h', 'INTEGER', 'REPEATED', None, ()), SchemaField('me', 'INTEGER', 'NULLABLE', None, ()), SchemaField('data', 'RECORD', 'NULLABLE', None, [SchemaField('data2', 'STRING', 'NULLABLE', None, ()), SchemaField('me2', 'INTEGER', 'NULLABLE', None, ())])]
在这个问题中,我使用@luckylwk的代码作为参考:How to map a Python Dict to a Big Query Schema,专门用于nested and repeated列。
另外,从BQ python库中检查SchemaField类。从那里,您可以获取要在python客户端,CLI或与您的用例匹配的模式下使用架构的格式。