问题
因此,我收集了一些进行联接的行,并且需要使用查询删除这些行。有人知道如何构成该删除查询吗?我知道这听起来很简单,但找不到办法。
代码
SELECT * FROM (
SELECT
entity_key, min(actual_posting_time) as min_time
FROM
myTable
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP("2018-12-01") AND TIMESTAMP("2018-12-04")
GROUP BY
entity_key
HAVING
COUNT(*) >= 2
)t1
LEFT JOIN
(
SELECT entity_key, actual_posting_time
FROM
myTable
WHERE _PARTITIONTIME BETWEEN TIMESTAMP("2018-12-01") AND TIMESTAMP("2018-12-04")
) t2
ON t1.entity_key = t2.entity_key
AND min_time <> t2.actual_posting_time )
因此,从上面的子选择中,我想删除myTable中的每条记录。任何建议都非常感谢。
答案 0 :(得分:0)
据我了解的查询,您想保留具有相同entity_key的行的最旧记录。在这种情况下,您可以只CONCAT
这两个字段,像这样:
DELETE * FROM myTable
WHERE CONCAT(CAST(entity_key as string), '_', CAST(actual_posting_time as string))
NOT IN (
SELECT
CONCAT(CAST(entity_key as string), '_', CAST(min(actual_posting_time) as min_time)
FROM
myTable
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP("2018-12-01") AND TIMESTAMP("2018-12-04")
AND entity_key IS NOT NULL
GROUP BY
entity_key
HAVING
COUNT(*) >= 2
)
子查询中WHERE子句的第二个条件是由于NOT IN
与标准SQL的语义所致,如here所述。使用一些公共数据集,您可以看到将通过使用select命令删除的结果:
#standardSQL
SELECT *
FROM `bigquery-public-data.austin_311.311_service_requests`
WHERE CONCAT(CAST(complaint_type as string), '_',CAST(status_change_date as string)) NOT IN (
SELECT CONCAT(CAST(complaint_type as string), '_',CAST(min(status_change_date) as string))
FROM `bigquery-public-data.austin_311.311_service_requests`
WHERE complaint_type is not null
GROUP BY complaint_type
)
另一种实现此目的的方法应该是使用EXISTS
,如下所示:
#standardSQL
WITH t1 AS (
SELECT complaint_type, MIN(status_change_date) AS min_date
FROM `bigquery-public-data.austin_311.311_service_requests`
GROUP BY complaint_type )
SELECT *
FROM `bigquery-public-data.austin_311.311_service_requests` AS t2
WHERE NOT EXISTS (
SELECT 1
FROM t1
WHERE t1.complaint_type = t2.complaint_type
AND t1.min_date = t2.status_change_date
)
请注意,在此公共表中,结果有些不同,因为有些行的status_change_date
为NULL
。 NOT IN
不会删除这些,而NOT EXISTS
是。