我有一组BigQuery表,它们每天收集测试结果,每个表的名称都类似于various_tests.test_name_20190523
。我有一个查询,可以在指定的日期范围内运行以查找失败次数和所有测试的失败率,但是我更喜欢获得一个表,该表具有多个日期范围,每个日期范围在表中排成一行,例如BETWEEN "20190901" AND "20190916", BETWEEN "20190916" AND "20191001", BETWEEN "20191001" AND "20191016"
。每行的列将与此处的单行结果相同。有什么好方法吗?
SELECT
"20190916" as StartDate, "20191001" as EndDate,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE
_TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
(真实查询在WHERE
和CASE
子句中还有其他几个条件,但为清楚起见将其省略。)
答案 0 :(得分:1)
一种方法是使用脚本。下面的示例使用bigquery-public-data.google_analytics_sample.ga_sessions_*
来说明这个想法。您可以轻松地将其适应您的情况。
也可以根据您的需要生成date_ranges
。
DECLARE date_ranges ARRAY<STRUCT<s STRING, e STRING>>
DEFAULT [
('20170801', '20170802'),
('20170703', '20170704'),
('20170603', '20170604')
];
DECLARE index INT64 DEFAULT 0;
CREATE TEMP TABLE result(s STRING, e STRING, cnt INT64);
LOOP
IF index = array_length(date_ranges)
THEN BREAK;
END IF;
BEGIN
DECLARE date_start STRING DEFAULT date_ranges[OFFSET(index)].s;
DECLARE date_end STRING DEFAULT date_ranges[OFFSET(index)].e;
INSERT INTO result
SELECT date_start, date_end, count(*) cnt
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN date_start and date_end ;
SET index = index + 1;
END;
END LOOP;
SELECT * FROM result;
输出
+----------+----------+------+
| s | e | cnt |
+----------+----------+------+
| 20170703 | 20170704 | 3984 |
| 20170603 | 20170604 | 2933 |
| 20170801 | 20170802 | 2556 |
+----------+----------+------+
费用
与手动更改开始日期/结束日期多次运行的费用相同。
性能
由于必须单独运行多个INSERT INTO,因此效率不如单个查询(您必须手动烹饪)。
可扩展性
临时表仍受每个表的每日DML配额限制,因此单个脚本只能完成1000次插入。
答案 1 :(得分:0)
尝试以下
SELECT
-- "20190916" as StartDate, "20191001" as EndDate,
_TABLE_SUFFIX AS Day,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE
_TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
GROUP BY DAY
确定多个日期范围并为每个查询运行查询的方法
SELECT
-- "20190916" as StartDate, "20191001" as EndDate,
CASE
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN "20190916"
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN "20180916"
END AS StartDate,
CASE
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN "20191001"
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN "20181001"
END AS EndDate,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE _TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
OR _TABLE_SUFFIX BETWEEN "20180916" AND "20181001"
GROUP BY StartDate, EndDate
...一种更简单的方法...
为避免重复所有条件,请尝试以下操作
SELECT
-- "20190916" as StartDate, "20191001" as EndDate,
(CASE
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN STRUCT("20190916" AS StartDate, "20191001" AS EndDate)
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN ("20180916", "20181001")
END).*,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE _TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
OR _TABLE_SUFFIX BETWEEN "20180916" AND "20181001"
GROUP BY StartDate, EndDate