我有以下查询,我想从数据库中获取100个项目,但host_id
多次出现在urls
表中,我希望获得最多10个唯一每个host_id
的该表中的行。
select *
from urls
join hosts using(host_id)
where
(
last_run_date is null
or last_run_date <= date_sub(curdate(), interval 30 day)
)
and ignore_url != 1
limit 100
所以,我想:
我不确定完成此任务需要做什么。有没有子查询可以做到这一点?
CREATE TABLE `hosts` (
`host_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`host` VARCHAR(50) NOT NULL,
`last_fetched` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`ignore_host` TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (`host_id`),
UNIQUE INDEX `host` (`host`)
)
CREATE TABLE `urls` (
`url_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`parent_url_id` INT(10) UNSIGNED NOT NULL,
`scheme` VARCHAR(5) NOT NULL,
`host_id` INT(10) UNSIGNED NOT NULL,
`path` VARCHAR(500) NOT NULL,
`query` VARCHAR(500) NOT NULL,
`date_found` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`last_run_date` DATETIME NULL DEFAULT NULL,
`ignore_url` TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (`url_id`),
UNIQUE INDEX `host_path_query` (`host_id`, `path`, `query`)
)
答案 0 :(得分:1)
多数民众赞成(我希望)
我无法测试我的真实情况。我没有数据。请测试它并给我一点ping。
SELECT *
FROM (
SELECT
@nr:=IF(@lasthost = host_id, @nr+1, 1) AS nr,
u.*,
@lasthost:=IF(@lasthost = host_id, @lasthost, host_id) AS lasthost
FROM
urls u,
( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
) AS t
LEFT JOIN HOSTS USING(host_id)
WHERE t.nr < 11
LIMIT 100;
确定,强>
<强>第一强>
我只选择您的查询行,并订购它 由host_id和时间
SELECT
u.*
FROM
urls u
( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
<强>第二强>
我添加变量 nr 和 lasthost 并在select中设置它。现在 如果host_id发生变化,我会计算每一行并将其重置为1。所以我得到了一个 每个host_id
的行数列表,从1到n选择
@nr:= IF(@lasthost = host_id,@ nr + 1,1)AS nr,
ü。*,
@lasthost:= IF(@lasthost = host_id,@ atomhost,host_id)AS lasthost
从
你好,
(SELECT @ nr:= 4,@ atomhost:= - 1)AS tmp
在哪里(
last_run_date是NULL
或者last_run_date&lt; = date_sub(curdate(),INTERVAL 30天)
)
AND ignore_url!= 1
ORDER BY host_id,last_run_date
<强>第三强>
我把这个查询放在一个新的选择中,这样我就可以加入你的第二个表,并且只为少于11的行限制结果,并将结果限制为100
SELECT *
FROM (
SELECT
@nr:=IF(@lasthost = host_id, @nr+1, 1) AS nr,
u.*,
@lasthost:=IF(@lasthost = host_id, @lasthost, host_id) AS lasthost
FROM
urls u,
( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
) AS t
LEFT JOIN HOSTS USING(host_id)
WHERE t.nr < 11
LIMIT 100;
多数人