我必须找到一组带有共享密钥的结果,该共享密钥已在过去90天内发布到我的本地数据存储的API,但不会晚于过去7天。
这将是我需要捕获1234的有效组:
account_id,company_id,posted_date
1234,A,2018-02-28
1234,B,2018-03-13
1234,C.2018-04-23
1234,D,2018-05-15
这将是一个无效的组。如果单个日期超出查询的上限或下限,则应从最终结果中排除帐户ID:
account_id,company_id,posted_date
5678,Z,2018-02-01
5678,Y,2018-03-13
5678,X.2018-04-23
5678,W,2018-05-21
这是使用子查询的查询的第一个草稿:
SELECT DISTINCT account_id, company_id FROM local_data_store.result_api
WHERE account_id NOT IN (
SELECT account_id FROM local_data_store.result_api
GROUP BY account_id
HAVING posted_date > DATE_SUB(NOW(), INTERVAL 7 DAY)
)
AND account_did IN (
SELECT account_did FROM local_data_store.result_api
GROUP BY account_did
HAVING posted_date > DATE_SUB(NOW(), INTERVAL 90 DAY)
)
GROUP BY account_id, company_id
LIMIT 100000;
这是我现在正在处理的没有子查询的查询(我尝试过加入,但他们确实没有工作):
SELECT DISTINCT account_id, company_id,
COUNT(ra1.posted_date > DATE_SUB(NOW(), INTERVAL 90 DAY)) AS day90,
COUNT(ra1.posted_date > DATE_SUB(NOW(), INTERVAL 7 DAY)) as day7
FROM local_data_store.result_api ra1
GROUP BY posted_date, account_id;
但它运行的时间太长,以至于数据库连接超时。这仅适用于375,000行的数据库表。
答案 0 :(得分:0)
这是我想出的解决方案。希望对别人有帮助。
SELECT DISTINCT account_did, cb_company_id, COUNT(did) as `# of jobs`
FROM cb_local_data_store.job_search_result_api
WHERE account_did NOT IN (
SELECT account_did FROM cb_local_data_store.job_search_result_api
WHERE posted_date < DATE_SUB(NOW(), INTERVAL 60 DAY)
OR posted_date > DATE_SUB(NOW(), INTERVAL 4 DAY) )
GROUP BY account_did, cb_company_id