Question

所以我想尝试这个例子： http://ajkannan.github.io/gcloud-python/latest/bigquery-usage.html

但是当我尝试创建一个表时：

import os
import subprocess
import sys
from gcloud.bigquery import SchemaField
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "toto.json"
os.environ['GCLOUD_PROJECT'] = 'titi'
from gcloud import pubsub

client = pubsub.Client('titi')

# Imports the Google Cloud client library
from google.cloud import bigquery

# Instantiates a client
bigquery_client = bigquery.Client()

# The name for the new dataset
dataset_name = 'tata'

dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(name='aspire_page')

table.schema = [
     SchemaField(name= 'id', type= 'int', mode= 'nullable'),
     SchemaField(name= 'zip', type= 'string', mode= 'nullable'),
     SchemaField(name= 'html', type= 'string', mode= 'nullable'),
      SchemaField(name= 'url', type= 'string', mode= 'nullable'),
      SchemaField(name= 'categorie', type= 'string', mode= 'nullable'),
     SchemaField(name= 'date', type= 'string', mode= 'nullable'),
     SchemaField(name='name', type= 'string', mode= 'nullable'),

]


table.create()

我有一个：

TypeError                                 Traceback (most recent call last)
<ipython-input-10-30edba459053> in <module>()
     23 
     24 table.schema = [
---> 25      SchemaField(name= 'id', type= 'int', mode= 'nullable'),
     26      SchemaField(name= 'zip', type= 'string', mode= 'nullable'),
     27      SchemaField(name= 'html', type= 'string', mode= 'nullable'),

TypeError: __init__() got an unexpected keyword argument 'type'

我不明白为什么SchemaField需要一种初始化类型......

如果有人有想法

谢谢和问候

编辑：

即使@ andre622也不能工作：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-f177aa490fbb> in <module>()
     29   SchemaField('categorie', 'STRING', mode= 'nullable'),
     30  SchemaField('date', 'STRING', mode= 'nullable'),
---> 31  SchemaField('name', 'STRING', mode= 'nullable'),
     32 ]
     33 

/usr/local/lib/python3.5/dist-packages/google/cloud/bigquery/table.py in schema(self, value)
    113         """
    114         if not all(isinstance(field, SchemaField) for field in value):
--> 115             raise ValueError('Schema items must be fields')
    116         self._schema = tuple(value)
    117 

ValueError: Schema items must be fields

即使有昵称建议：

import os
import subprocess
import sys
from gcloud.bigquery import SchemaField
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "toto.json"
os.environ['GCLOUD_PROJECT'] = 'titi'
from gcloud import pubsub

client = pubsub.Client('titi')

# Imports the Google Cloud client library
from google.cloud import bigquery

# Instantiates a client
bigquery_client = bigquery.Client()

# The name for the new dataset
dataset_name = 'choual'

# Prepares the new dataset
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(name='aspire_page')

table.schema = [
     SchemaField('id','INTEGER'),
     SchemaField('zip', 'STRING'),
     SchemaField('html', 'STRING'),
     SchemaField('url', 'STRING'),
     SchemaField('categorie', 'STRING'),
     SchemaField('date', 'STRING'),
     SchemaField('name', 'STRING')
]


table.create()

我收到了这个错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-191573ca7711> in <module>()
     29      SchemaField('categorie', 'STRING'),
     30      SchemaField('date', 'STRING'),
---> 31      SchemaField('name', 'STRING')
     32 ]
     33 

/usr/local/lib/python3.5/dist-packages/google/cloud/bigquery/table.py in schema(self, value)
    113         """
    114         if not all(isinstance(field, SchemaField) for field in value):
--> 115             raise ValueError('Schema items must be fields')
    116         self._schema = tuple(value)
    117 

ValueError: Schema items must be fields

Answer 1

取自github来源，SchemaField不会使用type，需要field_type，这是在@ andre622建议之前导致错误的原因：

（请注意，我没有编写以下代码。所有代码均属于Apache Inc.许可下的Google Inc.）

"""Describe a single field within a table schema.
:type name: str
:param name: the name of the field.
:type field_type: str
:param field_type: the type of the field (one of 'STRING', 'INTEGER',
                       'FLOAT', 'BOOLEAN', 'TIMESTAMP' or 'RECORD').
:type mode: str
:param mode: the type of the field (one of 'NULLABLE', 'REQUIRED',
                 or 'REPEATED').
:type description: str
:param description: optional description for the field.
:type fields: list of :class:`SchemaField`, or None
:param fields: subfields (requires ``field_type`` of 'RECORD').
"""
def __init__(self, name, field_type, mode='NULLABLE', description=None,
             fields=None):
    self.name = name
    self.field_type = field_type
    self.mode = mode
    self.description = description
    self.fields = fields

当您使用默认模式时，您应该可以使用：

table.schema = [
     SchemaField('id','INTEGER'),
     SchemaField('zip', 'STRING'),
     SchemaField('html', 'STRING'),
     SchemaField('url', 'STRING'),
     SchemaField('categorie', 'STRING'),
     SchemaField('date', 'STRING'),
     SchemaField('name', 'STRING')
]

至于为什么它需要一个类型，如何知道你想要在该字段中存储什么类型的数据，在DBMS中这允许为每个字段正确分配空间，因为一行将占用特定的字节数。然后通过知道第一行的位置和每行的大小来允许随机访问。

编辑：

你可以尝试一下：

table = dataset.table('aspire_page', [
         SchemaField('id','INTEGER'),
         SchemaField('zip', 'STRING'),
         SchemaField('html', 'STRING'),
         SchemaField('url', 'STRING'),
         SchemaField('categorie', 'STRING'),
         SchemaField('date', 'STRING'),
         SchemaField('name', 'STRING')
    ])

同时尝试使用bigquery.SchemaField代替SchemaField，从SchemaField和gcloud.bigquery导入google.cloud.bigquery后，您可能会发生名称冲突。

Answer 2

您不需要为要传递给表定义的前两个键值对提供键。此外，您的数据类型定义应遵循BigQuery需要如何摄取它们。您的架构应该定义为

table.schema = [
 SchemaField('id', 'INTEGER', mode= 'nullable'),
 SchemaField('zip', 'STRING', mode= 'nullable'),
 SchemaField('html', 'STRING', mode= 'nullable'),
  SchemaField('url', 'STRING', mode= 'nullable'),
  SchemaField('categorie', 'STRING', mode= 'nullable'),
 SchemaField('date', 'STRING', mode= 'nullable'),
 SchemaField('name', 'STRING', mode= 'nullable'),
]

Answer 3

我们一直遇到相同的错误，直到我们找到了。

使用最新的客户端库，在BigQuery中创建表已发生变化。

Here are examples from Google使用最新的库。

意外的关键字参数＆＃39; type＆＃39;对于bigquery

3 个答案: