在某些日期字符串上迭代SQL查询

时间:2019-10-15 19:21:49

标签: sql google-bigquery

我有一组BigQuery表,它们每天收集测试结果,每个表的名称都类似于various_tests.test_name_20190523。我有一个查询,可以在指定的日期范围内运行以查找失败次数和所有测试的失败率,但是我更喜欢获得一个表,该表具有多个日期范围,每个日期范围在表中排成一行,例如BETWEEN "20190901" AND "20190916", BETWEEN "20190916" AND "20191001", BETWEEN "20191001" AND "20191016"。每行的列将与此处的单行结果相同。有什么好方法吗?

SELECT
  "20190916" as StartDate, "20191001" as EndDate, 
  SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
  (SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
  `various_tests.test_name_*`
WHERE
  _TABLE_SUFFIX BETWEEN "20190916" AND "20191001"

(真实查询在WHERECASE子句中还有其他几个条件,但为清楚起见将其省略。)

2 个答案:

答案 0 :(得分:1)

一种方法是使用脚本。下面的示例使用bigquery-public-data.google_analytics_sample.ga_sessions_*来说明这个想法。您可以轻松地将其适应您的情况。

也可以根据您的需要生成date_ranges

DECLARE date_ranges ARRAY<STRUCT<s STRING, e STRING>>
  DEFAULT [
   ('20170801', '20170802'), 
   ('20170703', '20170704'),
   ('20170603', '20170604')
   ];
DECLARE index INT64 DEFAULT 0;
CREATE TEMP TABLE result(s STRING, e STRING, cnt INT64); 
LOOP
  IF index = array_length(date_ranges) 
    THEN BREAK;
  END IF;
  BEGIN
    DECLARE date_start STRING DEFAULT date_ranges[OFFSET(index)].s;
    DECLARE date_end STRING DEFAULT date_ranges[OFFSET(index)].e;
    INSERT INTO result
    SELECT date_start, date_end, count(*) cnt
    FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` 
    WHERE _TABLE_SUFFIX BETWEEN date_start and date_end ;
    SET index = index + 1;
  END;
END LOOP;
SELECT * FROM result;

输出

+----------+----------+------+
|    s     |    e     | cnt  |
+----------+----------+------+
| 20170703 | 20170704 | 3984 |
| 20170603 | 20170604 | 2933 |
| 20170801 | 20170802 | 2556 |
+----------+----------+------+

费用

与手动更改开始日期/结束日期多次运行的费用相同。

性能

由于必须单独运行多个INSERT INTO,因此效率不如单个查询(您必须手动烹饪)。

可扩展性

临时表仍受每个表的每日DML配额限制,因此单个脚本只能完成1000次插入。

答案 1 :(得分:0)

尝试以下

SELECT
  -- "20190916" as StartDate, "20191001" as EndDate, 
  _TABLE_SUFFIX AS Day,
  SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
  (SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
  `various_tests.test_name_*`
WHERE
  _TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
GROUP BY DAY
  

确定多个日期范围并为每个查询运行查询的方法

SELECT
  -- "20190916" as StartDate, "20191001" as EndDate, 
  CASE 
    WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN "20190916"
    WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN "20180916"
  END AS StartDate,
  CASE 
    WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN "20191001"
    WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN "20181001"
  END AS EndDate,
  SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
  (SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
  `various_tests.test_name_*`
WHERE _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" 
OR _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" 
GROUP BY StartDate, EndDate  
  

...一种更简单的方法...

为避免重复所有条件,请尝试以下操作

SELECT
  -- "20190916" as StartDate, "20191001" as EndDate, 
  (CASE 
    WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN STRUCT("20190916" AS StartDate, "20191001" AS EndDate)
    WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN ("20180916", "20181001")
  END).*,
  SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
  (SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
  `various_tests.test_name_*`
WHERE _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" 
OR _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" 
GROUP BY StartDate, EndDate