如何在google bigquery中编写查询以推断列的数据类型?

时间:2017-05-16 12:56:41

标签: google-bigquery

我有一个包含所有字符串列的表,但我知道某些列是数字(或日期)。 BigQuery中是否有内置函数来推断各列的数据类型?像table_name中的select is_string(column_name)?

1 个答案:

答案 0 :(得分:1)

我想到的一个想法是将SAFE_CASTLOGICAL_AND结合使用,例如:

#standardSQL
WITH T AS (
  SELECT '2017-05-01' AS x, '3.14' AS y, '5' AS z UNION ALL
  SELECT '2017-03-02' AS x, '1.59' AS y, '-1' AS z UNION ALL
  SELECT NULL AS x, NULL AS y, NULL AS z
)
SELECT
  LOGICAL_AND(x IS NULL OR SAFE_CAST(x AS DATE) IS NOT NULL) AS x_is_date,
  LOGICAL_AND(y IS NULL OR SAFE_CAST(y AS FLOAT64) IS NOT NULL) AS y_is_float64,
  LOGICAL_AND(z IS NULL OR SAFE_CAST(z AS TIMESTAMP) IS NOT NULL) AS z_is_timestamp
FROM T;

这会导致true,true和false(z值不是时间戳)。如果要多次重用同一个表达式,可以使用SQL UDF使其更简洁:

#standardSQL
CREATE TEMP FUNCTION IsDate(x STRING) AS (
  x IS NULL OR SAFE_CAST(x AS DATE) IS NOT NULL
);

WITH T AS (
  SELECT '2017-05-01' AS x, '3.14' AS y, '5' AS z UNION ALL
  SELECT '2017-03-02' AS x, '1.59' AS y, '-1' AS z UNION ALL
  SELECT NULL AS x, NULL AS y, NULL AS z
)
SELECT
  LOGICAL_AND(IsDate(x)) AS x_is_date,
  LOGICAL_AND(IsDate(y)) AS y_is_date,
  LOGICAL_AND(IsDate(z)) AS z_is_date
FROM T;

这会导致true,false,false,因为只有x具有日期格式的值。