BigQuery:从子选择中删除由表联接引起的记录

时间:2019-03-19 17:26:57

标签: sql google-bigquery

问题

因此,我收集了一些进行联接的行,并且需要使用查询删除这些行。有人知道如何构成该删除查询吗?我知道这听起来很简单,但找不到办法。

代码

SELECT * FROM (
        SELECT 
           entity_key, min(actual_posting_time) as min_time
        FROM 
            myTable
        WHERE 
            _PARTITIONTIME BETWEEN TIMESTAMP("2018-12-01") AND TIMESTAMP("2018-12-04")
        GROUP BY
            entity_key
        HAVING
            COUNT(*) >= 2 
        )t1
    LEFT JOIN
        (
        SELECT entity_key, actual_posting_time
        FROM 
            myTable 
        WHERE _PARTITIONTIME BETWEEN TIMESTAMP("2018-12-01") AND TIMESTAMP("2018-12-04")
        ) t2
    ON t1.entity_key  = t2.entity_key
  AND min_time <> t2.actual_posting_time )

因此,从上面的子选择中,我想删除myTable中的每条记录。任何建议都非常感谢。

1 个答案:

答案 0 :(得分:0)

据我了解的查询,您想保留具有相同entity_key的行的最旧记录。在这种情况下,您可以只CONCAT这两个字段,像这样:

DELETE * FROM myTable 
WHERE CONCAT(CAST(entity_key as string), '_', CAST(actual_posting_time as string)) 
NOT IN (
    SELECT 
       CONCAT(CAST(entity_key as string), '_', CAST(min(actual_posting_time) as min_time)
    FROM 
        myTable
    WHERE 
        _PARTITIONTIME BETWEEN TIMESTAMP("2018-12-01") AND TIMESTAMP("2018-12-04")
        AND entity_key IS NOT NULL
    GROUP BY
        entity_key
    HAVING
        COUNT(*) >= 2 
)

子查询中WHERE子句的第二个条件是由于NOT IN与标准SQL的语义所致,如here所述。使用一些公共数据集,您可以看到将通过使用select命令删除的结果:

#standardSQL
SELECT * 
FROM `bigquery-public-data.austin_311.311_service_requests` 
WHERE CONCAT(CAST(complaint_type as string), '_',CAST(status_change_date as string)) NOT IN (
    SELECT CONCAT(CAST(complaint_type as string), '_',CAST(min(status_change_date) as string))
    FROM `bigquery-public-data.austin_311.311_service_requests`
    WHERE complaint_type is not null
    GROUP BY complaint_type
)

另一种实现此目的的方法应该是使用EXISTS,如下所示:

#standardSQL
WITH t1 AS (
  SELECT complaint_type, MIN(status_change_date) AS min_date
  FROM `bigquery-public-data.austin_311.311_service_requests`
  GROUP BY complaint_type )
SELECT *
FROM `bigquery-public-data.austin_311.311_service_requests` AS t2
WHERE NOT EXISTS (
  SELECT 1
  FROM t1
  WHERE t1.complaint_type = t2.complaint_type
    AND t1.min_date = t2.status_change_date 
)

请注意,在此公共表中,结果有些不同,因为有些行的status_change_dateNULLNOT IN不会删除这些,而NOT EXISTS是。