我如何找到谷歌bigquery数据集大小,而不是表大小

时间:2015-12-07 17:22:36

标签: google-bigquery

我可以在BigQuery中看到表格的元数据详细信息,但是对于项目估算,我希望看到整个数据集的元数据。

SELECT * From 'dataset'._TABLES_SUMMARY_WHERE size_bytes>0 isn't working for me.  

2 个答案:

答案 0 :(得分:13)

SELECT SUM(size_bytes) AS bytes 
FROM [yourdataset.__TABLES__]

答案 1 :(得分:0)

先前的答案是正确的,但我想扩展答案。

在BigQuery StandardSQL上,您可以按以下数据集查询大小:

SELECT
  dataset_id,
  count(*) AS tables,
  SUM(row_count) AS total_rows,
  SUM(size_bytes) AS size_bytes
FROM ( 
  SELECT * FROM `dataset1.__TABLES__` UNION ALL
  SELECT * FROM `dataset2.__TABLES__` UNION ALL
  ...
)
GROUP BY 1
ORDER BY size_bytes DESC

不幸的是,我还没有找到一种方法来列出项目所有数据集的所有表。相反,我使用bq命令行来生成所有SELECT ... UNION ALL语句:

bq ls --format=json | jq -r '.[] | select(.location == "EU") | .id' | sed 's/:/./' | sed 's/\(.*\)/SELECT * FROM `\1.__TABLES__` UNION ALL/'