我的数据库包含许多网站的统计数据,我目前遇到的问题是相当复杂的查询,我不知道该怎么做(或者甚至可能)。
我有2个表:websites
和visits
。前者是所有网站及其属性的列表,而前者是每个用户在特定网站上访问的列表。
我正在制作的节目应该提取需要“扫描”的网站。每个站点的每次扫描之间的间隔取决于过去30天的网站总访问次数。这是一个包含预期扫描间隔的表:
表格具有以下结构:
我想要的是一个返回 或过去各个更新截止日期的网站的查询(可以从last_scanned
列中看到)
这在单个查询中是否可以轻松实现?
答案 0 :(得分:1)
这是你可以尝试的东西:
SELECT main.*
FROM (
SELECT
w.web_id,
w.url,
w.last_scanned,
(SELECT COUNT(*)
FROM visits v
WHERE v.web_id = w.web_id
AND TIMESTAMPDIFF(DAY,v.added_on, NOW()) <=30
) AS visit_count,
TIMESTAMPDIFF(HOUR,w.last_scanned, NOW()) AS hrs_since_update
FROM websites w
) main
WHERE
(CASE
WHEN visit_count >= 0 AND visit_count <= 10 AND hrs_since_update >= 4320 THEN 1
WHEN visit_count >= 11 AND visit_count <= 100 AND hrs_since_update >= 2160 THEN 1
WHEN visit_count >= 101 AND visit_count <= 500 AND hrs_since_update >= 1080 THEN 1
WHEN visit_count >= 501 AND visit_count <= 1000 AND hrs_since_update >= 720 THEN 1
WHEN visit_count >= 1001 AND visit_count <= 2000 AND hrs_since_update >= 360 THEN 1
WHEN visit_count >= 2001 AND visit_count <= 5000 AND hrs_since_update >= 168 THEN 1
WHEN visit_count >= 5001 AND visit_count <= 10000 AND hrs_since_update >= 72 THEN 1
WHEN visit_count >= 10001 AND hrs_since_update >= 24 THEN 1
ELSE 0
END) = 1;
这是小提琴演示:http://sqlfiddle.com/#!9/1f671/1
答案 1 :(得分:0)
首先,我会创建一个子查询,以便从visits
表中获取每个不同web_id
的访问次数。然后,LEFT OUTER JOIN
websites
表到此子查询。然后,您可以在访问更新频率表中查询每个可能条件的结果,如下所示:
SELECT websites.* FROM websites
LEFT OUTER JOIN (
SELECT visits.web_id, COUNT(*) AS visits_count FROM visits GROUP BY visits.web_id
) v ON v.web_id = websites.web_id
WHERE
(v.visits_count <= 10 AND websites.last_scanned <= DATE_SUB(NOW(), INTERVAL 4320 HOUR)) OR
(v.visits_count BETWEEN 11 AND 100 AND websites.last_scanned <= DATE_SUB(NOW(), INTERVAL 2160 HOUR)) OR
(v.visits_count BETWEEN 101 AND 500 AND websites.last_scanned <= DATE_SUB(NOW(), INTERVAL 1080 HOUR)) OR
(v.visits_count BETWEEN 501 AND 1000 AND websites.last_scanned <= DATE_SUB(NOW(), INTERVAL 720 HOUR)) OR
(v.visits_count BETWEEN 1001 AND 2000 AND websites.last_scanned <= DATE_SUB(NOW(), INTERVAL 360 HOUR)) OR
(v.visits_count BETWEEN 2001 AND 5000 AND websites.last_scanned <= DATE_SUB(NOW(), INTERVAL 168 HOUR)) OR
(v.visits_count BETWEEN 5001 AND 10000 AND websites.last_scanned <= DATE_SUB(NOW(), INTERVAL 72 HOUR)) OR
(v.visits_count > 10000 AND websites.last_scanned <= DATE_SUB(NOW(), INTERVAL 24 HOUR));
答案 2 :(得分:0)
只是对@morgb查询的改进,使用表格来访问计数范围
create table visitCount (
`min` bigint(20),
`max` bigint(20),
`frequency` bigint(20)
);
SELECT main.*
FROM (
SELECT
w.web_id,
w.url,
w.last_scanned,
(SELECT COUNT(*)
FROM visits v
WHERE v.web_id = w.web_id
AND TIMESTAMPDIFF(DAY,v.added_on, NOW()) <=30
) AS visit_count,
TIMESTAMPDIFF(HOUR,w.last_scanned, NOW()) AS hrs_since_update
FROM websites w
) main inner join
visitCount v on visit_count between v.min and v.max
WHERE
main.hrs_since_update > v.frequency