所以我想尝试这个例子: http://ajkannan.github.io/gcloud-python/latest/bigquery-usage.html
但是当我尝试创建一个表时:
import os
import subprocess
import sys
from gcloud.bigquery import SchemaField
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "toto.json"
os.environ['GCLOUD_PROJECT'] = 'titi'
from gcloud import pubsub
client = pubsub.Client('titi')
# Imports the Google Cloud client library
from google.cloud import bigquery
# Instantiates a client
bigquery_client = bigquery.Client()
# The name for the new dataset
dataset_name = 'tata'
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(name='aspire_page')
table.schema = [
SchemaField(name= 'id', type= 'int', mode= 'nullable'),
SchemaField(name= 'zip', type= 'string', mode= 'nullable'),
SchemaField(name= 'html', type= 'string', mode= 'nullable'),
SchemaField(name= 'url', type= 'string', mode= 'nullable'),
SchemaField(name= 'categorie', type= 'string', mode= 'nullable'),
SchemaField(name= 'date', type= 'string', mode= 'nullable'),
SchemaField(name='name', type= 'string', mode= 'nullable'),
]
table.create()
我有一个:
TypeError Traceback (most recent call last)
<ipython-input-10-30edba459053> in <module>()
23
24 table.schema = [
---> 25 SchemaField(name= 'id', type= 'int', mode= 'nullable'),
26 SchemaField(name= 'zip', type= 'string', mode= 'nullable'),
27 SchemaField(name= 'html', type= 'string', mode= 'nullable'),
TypeError: __init__() got an unexpected keyword argument 'type'
我不明白为什么SchemaField需要一种初始化类型......
如果有人有想法
谢谢和问候
编辑:
即使@ andre622也不能工作:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-f177aa490fbb> in <module>()
29 SchemaField('categorie', 'STRING', mode= 'nullable'),
30 SchemaField('date', 'STRING', mode= 'nullable'),
---> 31 SchemaField('name', 'STRING', mode= 'nullable'),
32 ]
33
/usr/local/lib/python3.5/dist-packages/google/cloud/bigquery/table.py in schema(self, value)
113 """
114 if not all(isinstance(field, SchemaField) for field in value):
--> 115 raise ValueError('Schema items must be fields')
116 self._schema = tuple(value)
117
ValueError: Schema items must be fields
即使有昵称建议:
import os
import subprocess
import sys
from gcloud.bigquery import SchemaField
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "toto.json"
os.environ['GCLOUD_PROJECT'] = 'titi'
from gcloud import pubsub
client = pubsub.Client('titi')
# Imports the Google Cloud client library
from google.cloud import bigquery
# Instantiates a client
bigquery_client = bigquery.Client()
# The name for the new dataset
dataset_name = 'choual'
# Prepares the new dataset
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(name='aspire_page')
table.schema = [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
]
table.create()
我收到了这个错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-191573ca7711> in <module>()
29 SchemaField('categorie', 'STRING'),
30 SchemaField('date', 'STRING'),
---> 31 SchemaField('name', 'STRING')
32 ]
33
/usr/local/lib/python3.5/dist-packages/google/cloud/bigquery/table.py in schema(self, value)
113 """
114 if not all(isinstance(field, SchemaField) for field in value):
--> 115 raise ValueError('Schema items must be fields')
116 self._schema = tuple(value)
117
ValueError: Schema items must be fields
答案 0 :(得分:1)
取自github来源,SchemaField不会使用type
,需要field_type
,这是在@ andre622建议之前导致错误的原因:
(请注意,我没有编写以下代码。所有代码均属于Apache Inc.许可下的Google Inc.)
"""Describe a single field within a table schema.
:type name: str
:param name: the name of the field.
:type field_type: str
:param field_type: the type of the field (one of 'STRING', 'INTEGER',
'FLOAT', 'BOOLEAN', 'TIMESTAMP' or 'RECORD').
:type mode: str
:param mode: the type of the field (one of 'NULLABLE', 'REQUIRED',
or 'REPEATED').
:type description: str
:param description: optional description for the field.
:type fields: list of :class:`SchemaField`, or None
:param fields: subfields (requires ``field_type`` of 'RECORD').
"""
def __init__(self, name, field_type, mode='NULLABLE', description=None,
fields=None):
self.name = name
self.field_type = field_type
self.mode = mode
self.description = description
self.fields = fields
当您使用默认模式时,您应该可以使用:
table.schema = [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
]
至于为什么它需要一个类型,如何知道你想要在该字段中存储什么类型的数据,在DBMS中这允许为每个字段正确分配空间,因为一行将占用特定的字节数。然后通过知道第一行的位置和每行的大小来允许随机访问。
编辑:
你可以尝试一下:
table = dataset.table('aspire_page', [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
])
同时尝试使用bigquery.SchemaField
代替SchemaField
,从SchemaField
和gcloud.bigquery
导入google.cloud.bigquery
后,您可能会发生名称冲突。
答案 1 :(得分:0)
您不需要为要传递给表定义的前两个键值对提供键。此外,您的数据类型定义应遵循BigQuery需要如何摄取它们。您的架构应该定义为
table.schema = [
SchemaField('id', 'INTEGER', mode= 'nullable'),
SchemaField('zip', 'STRING', mode= 'nullable'),
SchemaField('html', 'STRING', mode= 'nullable'),
SchemaField('url', 'STRING', mode= 'nullable'),
SchemaField('categorie', 'STRING', mode= 'nullable'),
SchemaField('date', 'STRING', mode= 'nullable'),
SchemaField('name', 'STRING', mode= 'nullable'),
]
答案 2 :(得分:0)