在下面的google big查询中,我在Id,StartTime和StopTime上加入了两个表“Data”和“Location”。
由于数据按日期划分,因此我在WHERE clauase中具有基于PartitionTime的条件。
查询运行了很长时间(约20分钟),只是想知道我是否缺少一些性能技术来提高查询效率。
任何帮助将不胜感激。谢谢!!
SELECT
*
FROM (
SELECT
A.Id AS Id, A.Id1 AS Id1, StartTime, StopTime, Latitude, Longitude, DateTime
FROM
`Data` AS A
JOIN
(SELECT * FROM `Location` WHERE _TABLE_SUFFIX IN ("01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18",
"19","20","21", "22", "23","24", "26", "27", "28","29","30","31" )) AS B
ON
A.StartTime < B.DateTime
AND A.StopTime >= B.DateTime
AND A.Id = B.Id
WHERE
(A._PARTITIONTIME BETWEEN TIMESTAMP('2016-11-01')
AND TIMESTAMP('2016-11-30'))
ORDER BY
B.Id,
A.Id1,
B.DateTime )
ORDER BY
Id,
Id1,
DateTime
答案 0 :(得分:1)
有几点想法:
ORDER BY
不需要,因为只有顶级ORDER BY
会对查询结果产生影响。"25"
以外的所有后缀,可以使用_TABLE_SUFFIX BETWEEN "01" AND "31" AND _TABLE_SUFFIX != "25"
。JOIN
的类型,_PARTITIONTIME
上的过滤器可能不会被按下&#34;&#34;避免自动读取额外数据,例如如果您实际使用的是RIGHT JOIN
。如果是这种情况,请使用子查询,例如(SELECT * FROM YourTable WHERE _PARTITIONTIME BETWEEN ...) AS A RIGHT JOIN ...
。如果您希望BigQuery工程师更详细地了解时间,您可以在问题中添加一个示例作业ID,然后有人可以提供帮助。
答案 1 :(得分:0)
我还会删除外部ORDER BY
,因为我认为它是查询性能的主要杀手
将_PARTITIONTIME
移至相应的表是另一个需要考虑的事项
在子选择中使用SELECT *
不会影响性能和成本(因为它是最终的外部SELECT,它定义除WHERE
和其他子句中使用的列之外还使用哪些列),但是作为一个好的练习我认为最好列出明确需要的列/字段
#standardSQL
SELECT
A.Id AS Id, A.Id1 AS Id1, StartTime, StopTime, Latitude, Longitude, DateTime
FROM (
SELECT Id, Id1, StartTime, StopTime
FROM `Data`
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2016-11-01') AND TIMESTAMP('2016-11-30')
) AS A
JOIN (
SELECT Latitude, Longitude, DateTime
FROM `Location`
WHERE _TABLE_SUFFIX IN ("01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18",
"19","20","21", "22", "23","24", "26", "27", "28","29","30","31" )
) AS B
ON A.StartTime < B.DateTime
AND A.StopTime >= B.DateTime
AND A.Id = B.Id
您也可以考虑以下声明中的“压缩”,如Elliott建议的那样,
WHERE _TABLE_SUFFIX IN ("01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18",
"19","20","21", "22", "23","24", "26", "27", "28","29","30","31" )
但要小心,因为这会导致涉及不需要的表(如果您的数据集中有这样的表)。例如那些后缀为'011'或'046'等的那些。
另一个选择是 - 您可能在Data
中的分区与Location
中的后缀之间存在某种逻辑关系。如果是这样,你可以使用它来缩小JOIN,从而使其更具性能