Bigquery:获取查询架构而不实际运行它

时间:2016-02-15 18:09:59

标签: google-bigquery

有没有办法在没有实际运行的情况下获取bigquery查询的模式? (我已尝试过DryRun,但它只返回统计信息,但没有实际架构)

3 个答案:

答案 0 :(得分:2)

在没有运行查询的情况下获取模式的好方法。然而 是一种hacky方式。

您可以使用要签出的查询创建视图。然后,该视图将具有运行该查询所产生的模式。然后,您可以在完成后删除该视图。

答案 1 :(得分:0)

假设您可以通过API获取架构,则需要调用Tables: get方法来获取表的架构。

对于来自publicdata项目中样本数据集的natality表,请求将为

   GET https://www.googleapis.com/bigquery/v2/projects/publicdata/datasets/samples/tables/natality?key={YOUR_API_KEY}

并且相关的响应将是

{ 
 "kind": "bigquery#table",
 "etag": "\"nwg3tKAm7RiC5vqWthFIuCNSGxs/MTQ0MDYyNTMzMDYwNA\"",
 "id": "publicdata:samples.natality",
 "selfLink": "https://www.googleapis.com/bigquery/v2/projects/publicdata/datasets/samples/tables/natality",
 "tableReference": {
  "projectId": "publicdata",
  "datasetId": "samples",
  "tableId": "natality"
 },
 "description": "This table describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008. The Centers for Disease Control (CDC) and Prevention's National Center for Health Statistics (NCHS) receives this data as electronic files, prepared from individual records processed by each registration area, through the Vital Statistics Cooperative Program. \n\nYou can access the CDC's data at: http://www.cdc.gov/nchs/data_access/Vitalstatsonline.htm",
 "schema": {
  "fields": [
   {
    "name": "source_year",
    "type": "INTEGER",
    "mode": "REQUIRED",
    "description": "Four-digit year of the birth. Example: 1975."
   },
   {
    "name": "year",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Four-digit year of the birth. Example: 1975."
   },
   {
    "name": "month",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Month index of the date of birth, where 1=January."
   },
   {
    "name": "day",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Day of birth, starting from 1."
   },
   {
    "name": "wday",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Day of the week, where 1 is Sunday and 7 is Saturday."
   },
   {
    "name": "state",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "The two character postal code for the state. Entries after 2004 do not include this value."
   },
   {
    "name": "is_male",
    "type": "BOOLEAN",
    "mode": "REQUIRED",
    "description": "TRUE if the child is male, FALSE if female."
   },
   {
    "name": "child_race",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "The race of the child. One of the following numbers:\n\n1 - White\n2 - Black\n3 - American Indian\n4 - Chinese\n5 - Japanese\n6 - Hawaiian\n7 - Filipino\n9 - Unknown/Other\n18 - Asian Indian\n28 - Korean\n39 - Samoan\n48 - Vietnamese"
   },
   {
    "name": "weight_pounds",
    "type": "FLOAT",
    "mode": "NULLABLE",
    "description": "Weight of the child, in pounds."
   },
   {
    "name": "plurality",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "How many children were born as a result of this pregnancy. twins=2, triplets=3, and so on."
   },
   {
    "name": "apgar_1min",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Apgar scores measure the health of a newborn child on a scale from 0-10. Value after 1 minute. Available from 1978-2002."
   },
   {
    "name": "apgar_5min",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Apgar scores measure the health of a newborn child on a scale from 0-10. Value after 5 minutes. Available from 1978-2002."
   },
   {
    "name": "mother_residence_state",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "The two-letter postal code of the mother's state of residence when the child was born."
   },
   {
    "name": "mother_race",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Race of the mother. Same values as child_race."
   },
   {
    "name": "mother_age",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Reported age of the mother when giving birth."
   },
   {
    "name": "gestation_weeks",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "The number of weeks of the pregnancy."
   },
   {
    "name": "lmp",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "Date of the last menstrual period in the format MMDDYYYY. Unknown values are recorded as \"99\" or \"9999\"."
   },
   {
    "name": "mother_married",
    "type": "BOOLEAN",
    "mode": "NULLABLE",
    "description": "True if the mother was married when she gave birth."
   },
   {
    "name": "mother_birth_state",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "The two-letter postal code of the mother's birth state."
   },
   {
    "name": "cigarette_use",
    "type": "BOOLEAN",
    "mode": "NULLABLE",
    "description": "True if the mother smoked cigarettes. Available starting 2003."
   },
   {
    "name": "cigarettes_per_day",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of cigarettes smoked by the mother per day. Available starting 2003."
   },
   {
    "name": "alcohol_use",
    "type": "BOOLEAN",
    "mode": "NULLABLE",
    "description": "True if the mother used alcohol. Available starting 1989."
   },
   {
    "name": "drinks_per_week",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of drinks per week consumed by the mother. Available starting 1989."
   },
   {
    "name": "weight_gain_pounds",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of pounds gained by the mother during pregnancy."
   },
   {
    "name": "born_alive_alive",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of children previously born to the mother who are now living."
   },
   {
    "name": "born_alive_dead",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of children previously born to the mother who are now dead."
   },
   {
    "name": "born_dead",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of children who were born dead (i.e. miscarriages)"
   },
   {
    "name": "ever_born",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Total number of children to whom the woman has ever given birth (includes the current birth)."
   },
   {
    "name": "father_race",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Race of the father. Same values as child_race."
   },
   {
    "name": "father_age",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Age of the father when the child was born."
   },
   {
    "name": "record_weight",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "1 or 2, where 1 is a row from a full-reporting area, and 2 is a row from a 50% sample area."
   }
  ]
 },
 "numBytes": "23562717384",
 "numRows": "137826763",
 "creationTime": "1335916045005",
 "lastModifiedTime": "1440625330604",
 "type": "TABLE",
 "location": "US"
}

如果命令行更方便,可以使用以下参数运行bq命令以获取表的模式:

bq show publicdata:samples.natality

输出如下:

Table publicdata:samples.natality

   Last modified                  Schema                 Total Rows   Total Bytes   Expiration
 ----------------- ------------------------------------ ------------ ------------- ------------
  27 Aug 00:42:10   |- source_year: integer (required)   137826763    23562717384
                    |- year: integer
                    |- month: integer
                    |- day: integer
                    |- wday: integer
                    |- state: string
                    |- is_male: boolean (required)
                    |- child_race: integer
                    |- weight_pounds: float
                    |- plurality: integer
                    |- apgar_1min: integer
                    |- apgar_5min: integer
                    |- mother_residence_state: string
                    |- mother_race: integer
                    |- mother_age: integer
                    |- gestation_weeks: integer
                    |- lmp: string
                    |- mother_married: boolean
                    |- mother_birth_state: string
                    |- cigarette_use: boolean
                    |- cigarettes_per_day: integer
                    |- alcohol_use: boolean
                    |- drinks_per_week: integer
                    |- weight_gain_pounds: integer
                    |- born_alive_alive: integer
                    |- born_alive_dead: integer
                    |- born_dead: integer
                    |- ever_born: integer
                    |- father_race: integer
                    |- father_age: integer
                    |- record_weight: integer

答案 2 :(得分:0)

成功[ref时,无论正常运行还是空运行,模式都包含在查询响应主体中,这可能就是视图如何在不运行查询的情况下获取其模式的原因。

但是,如果要使用bigquery's python library进行检索,则必须访问QueryJob类的“内部”属性和方法,如下所示,因为没有提供“公共”属性和方法... < / p>

from google.cloud import bigquery
# bigquery.__version__ == '1.9.0'

client = bigquery.Client()
job_config = bigquery.QueryJobConfig(dry_run=True)

query_job = client.query(
    query="SELECT * FROM `bigquery-public-data.usa_names.usa_1910_2013`",
    job_config=job_config,
)

# Solution 1
schema = query_job._properties['statistics']['query']['schema']

# Solution 2 
job_stats = query_job._job_statistics()
schema = job_stats['schema']

我花了一些时间才弄清楚。希望这会有所帮助!