我需要一个查询来查找Bigquery中表的列名,就像SQL中的以下查询一样:
SELECT column_name,data_type,data_length,data_precision,nullable FROM all_tab_cols where table_name ='EMP';
答案 0 :(得分:2)
目前无法通过查询检索表元数据(即列名和类型),但这不是第一次请求它。
您是否有理由将此作为查询?表元数据可通过tables API获得。
答案 1 :(得分:1)
实际上可以使用SQL来实现。为此,您需要在日志记录表中查询正在创建的特定表的最后一个日志。
例如,假设每天加载/创建表:
CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String)
RETURNS ARRAY<STRING> AS ((
SELECT
SPLIT(
REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,')
,',')
));
WITH valid_schema_columns AS (
WITH array_output aS (SELECT
jsonSchemaStringToArray(jsonSchema) AS column_names
FROM (
SELECT
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema
, ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count
FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101`
WHERE
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>'
AND
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>'
AND
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED'
) AS t
WHERE
t.record_count = 1 -- grab the latest entry
)
-- this is actually what UNNESTS the array into standard rows
SELECT
valid_column_name
FROM array_output
LEFT JOIN UNNEST(column_names) AS valid_column_name
)
答案 2 :(得分:1)
是的,您可以使用INFORMATION_SCHEMA获取表元数据。
过去链接中提到的示例之一是从INFO_SCHEMA.COLUMN_FIELD_PATHS视图中获取github_repos数据集中的commits表的元数据,
在GCP控制台中打开BigQuery网络用户界面。
在“查询编辑器”框中输入以下标准SQL查询。 INFORMATION_SCHEMA需要标准的SQL语法。标准SQL是GCP控制台中的默认语法。
SELECT
*
FROM
`bigquery-public-data`.github_repos.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE
table_name="commits"
AND column_name="author"
OR column_name="difference"
注意:INFORMATION_SCHEMA视图名称区分大小写。
结果应如下所示
+------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| table_name | column_name | field_path | data_type | description |
+------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| commits | author | author | STRUCT<name STRING, email STRING, time_sec INT64, tz_offset INT64, date TIMESTAMP> | NULL |
| commits | author | author.name | STRING | NULL |
| commits | author | author.email | STRING | NULL |
| commits | author | author.time_sec | INT64 | NULL |
| commits | author | author.tz_offset | INT64 | NULL |
| commits | author | author.date | TIMESTAMP | NULL |
| commits | difference | difference | ARRAY<STRUCT<old_mode INT64, new_mode INT64, old_path STRING, new_path STRING, old_sha1 STRING, new_sha1 STRING, old_repo STRING, new_repo STRING>> | NULL |
| commits | difference | difference.old_mode | INT64 | NULL |
| commits | difference | difference.new_mode | INT64 | NULL |
| commits | difference | difference.old_path | STRING | NULL |
| commits | difference | difference.new_path | STRING | NULL |
| commits | difference | difference.old_sha1 | STRING | NULL |
| commits | difference | difference.new_sha1 | STRING | NULL |
| commits | difference | difference.old_repo | STRING | NULL |
| commits | difference | difference.new_repo | STRING | NULL |
+------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
答案 3 :(得分:0)
BigQuery现在支持信息模式:
SELECT column_name
FROM `bigquery-public-data`.irs_990.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'irs_990_2015'
答案 4 :(得分:0)
对于像我这样的新手,上面的语法如下:
select * from project_name.dataset_name.INFORMATION_SCHEMA.COLUMNS where table_catalog=project_name and table_schema=dataset_name and table_name=table_name
答案 5 :(得分:0)
要检查列,您可以通过CLI访问表
bq query --use_legacy_sql=false 'select Hour, sum(column 1) as column from `project_id.dataset.table_name` where Date(Hour) = '2020-06-10';'