我发现对于具有相同模式的两个表的以下查询的执行时间存在显着性差异:
SELECT
VCF.reference_name AS chrm,
VCF.start AS start,
VCF.END AS END,
VCF.reference_bases AS reference_bases,
VCF.alternate_bases AS alternate_bases,
CONCAT("hg19_refGene: ",Annotation1.name ),
Annotation1.name2,
FROM (
SELECT
VCF.reference_name,
VCF.start,
VCF.END,
VCF.reference_bases,
VCF.alternate_bases,
Annotation1.name,
Annotation1.name2
FROM (
SELECT
*
FROM (
SELECT
REPLACE(reference_name, 'chr', '') AS reference_name,
start,
END,
reference_bases,
alternate_bases
FROM
[***genomics-public-data:platinum_genomes.variants***]
OMIT
RECORD IF EVERY(call.genotype <= 0))) AS VCF
JOIN
[ANOTHER_TABLE] AS Annotation1
ON
(Annotation1.chrm = VCF.reference_name)
WHERE
(((Annotation1.start >= VCF.start)
AND (Annotation1.start <= VCF.END))
OR ((Annotation1.END >= VCF.start)
AND (Annotation1.END <= VCF.END))
OR ((Annotation1.start <= VCF.start)
AND (Annotation1.END >= VCF.start))
OR ((Annotation1.start <= VCF.END)
AND (Annotation1.END >= VCF.END))))
需要处理 56.0s , 5.45 GB [职位编号:bquijob_217bb63b_15f4a9e559f]。
如果我用自己的表更改上面提到的公共表,其中包含完全相同的模式和更少的记录数。
然后经过 309.0s , 1.06 GB 处理。 [职位编号:bquijob_40adc8f0_15f4aaab5d0]。
BiqQuery伙计们,请你查看这两份工作?我是否缺少创建自己的表格的任何优化步骤?