我有一个SQL表(实际上是BigQuery
表),该表具有大量的列(超过一千)。我想快速找到每列的最小值和最大值。有办法吗?
我不可能列出所有列。寻找做类似事情的方法
SELECT MAX(*) FROM mytable;
然后运行
SELECT MIN(*) FROM mytable;
我一直无法通过Google做到这一点。甚至不确定是否有可能。
例如,如果我的表具有以下架构:
col1 col2 col3 .... col1000
(例如,最大)查询应返回
Row col1 col2 col3 ... col1000
1 3 18 0.6 ... 45
和min查询应该返回(说)
Row col1 col2 col3 ... col1000
1 -5 4 0.1 ... -5
这些数字仅供参考。列名可以是不同的字符串,并且不容易编写脚本。
答案 0 :(得分:3)
请参见下面的BigQuery标准SQL示例-它适用于任意数量的列,并且不需要显式调用/使用列名
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
)
SELECT
MIN(CAST(value AS INT64)) AS min_value,
MAX(CAST(value AS INT64)) AS max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value
有结果
Row min_value max_value
1 -1 11
注意:如果您的列属于STRING数据类型-您应删除CAST ... AS INT64
或者,如果它们是FLOAT64的-在CAST函数中用FLOAT64替换INT64
更新
以下是用于获取每一列的MIN / Max的选项,并将结果作为相应值的数组呈现为列中相应值的列表
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 14 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
), temp AS (
SELECT pos, MIN(CAST(value AS INT64)) min_value, MAX(CAST(value AS INT64)) max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value WITH OFFSET pos
GROUP BY pos
)
SELECT 'min_values' stats, TO_JSON_STRING(ARRAY_AGG(min_value ORDER BY pos)) vals FROM temp UNION ALL
SELECT 'max_values', TO_JSON_STRING(ARRAY_AGG(max_value ORDER BY pos)) FROM temp
结果为
Row stats vals
1 min_values [-1,2,3,4]
2 max_values [7,11,5,14]
希望这是您仍然可以将其应用到最终目标的东西