BigQuery - 具有相同模式的两个表的显着性能差异

时间:2017-10-23 19:49:28

标签: google-bigquery

我发现对于具有相同模式的两个表的以下查询的执行时间存在显着性差异:

SELECT
  VCF.reference_name AS chrm,
  VCF.start AS start,
  VCF.END AS END,
  VCF.reference_bases AS reference_bases,
  VCF.alternate_bases AS alternate_bases,
  CONCAT("hg19_refGene: ",Annotation1.name ),
  Annotation1.name2,
FROM (
  SELECT
    VCF.reference_name,
    VCF.start,
    VCF.END,
    VCF.reference_bases,
    VCF.alternate_bases,
    Annotation1.name,
    Annotation1.name2
  FROM (
    SELECT
      *
    FROM (
      SELECT
        REPLACE(reference_name, 'chr', '') AS reference_name,
        start,
        END,
        reference_bases,
        alternate_bases
      FROM
        [***genomics-public-data:platinum_genomes.variants***]
      OMIT
        RECORD IF EVERY(call.genotype <= 0))) AS VCF
  JOIN
    [ANOTHER_TABLE] AS Annotation1
  ON
    (Annotation1.chrm = VCF.reference_name)
  WHERE
    (((Annotation1.start >= VCF.start)
        AND (Annotation1.start <= VCF.END))
      OR ((Annotation1.END >= VCF.start)
        AND (Annotation1.END <= VCF.END))
      OR ((Annotation1.start <= VCF.start)
        AND (Annotation1.END >= VCF.start))
      OR ((Annotation1.start <= VCF.END)
        AND (Annotation1.END >= VCF.END))))

需要处理 56.0s 5.45 GB [职位编号:bquijob_217bb63b_15f4a9e559f]。

如果我用自己的表更改上面提到的公共表,其中包含完全相同的模式和更少的记录数。

然后经过 309.0s 1.06 GB 处理。 [职位编号:bquijob_40adc8f0_15f4aaab5d0]。

BiqQuery伙计们,请你查看这两份工作?我是否缺少创建自己的表格的任何优化步骤?

0 个答案:

没有答案