我正在构建一个对评级数据执行一些过滤的查询。
假设我有一个名为ratings
的简单表,如下所示,存储来自在线评级工具的数据:
+----------------+----------------+--------+ | page_title | timestamp | rating | +----------------+----------------+--------+ | Abc | 20110417092134 | 1 | | Abc | 20110418110831 | 2 | | Def | 20110417092205 | 3 | +----------------+----------------+--------+
我需要在最新的10个评级中提取具有高频率低值的页面,并将此查询限制为在前一周产生至少20个评级的页面。这是我提出的荒谬的长查询:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM
(
SELECT * FROM ratings a2 WHERE a2.page_title = a1.page_title
AND DATE(timestamp) <= '2011-04-24' ORDER BY timestamp DESC LIMIT 10
)
AS latest WHERE rating >=1 AND rating <=2 ORDER BY timestamp DESC
)
AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY a1.page_title HAVING COUNT(*) > 20
顶级查询在2011-04-24结束的一周内查找超过20个评级的页面,子查询应该检索最新10个评级中值为[1,2]的评级数。来自顶级查询的每篇文章。
MySQL抱怨子查询的WHERE子句中的a1.page_title是一个未知列,我怀疑这是因为a1在第二级查询中没有被定义为别名,而只是在顶级查询中,但是我无法解决这个问题。
(编辑)的
我在上面添加了一个关于我的嫌疑人的解释,关于跨级别引用另一个非常正常的查询,请注意这里a1没有在子查询中定义,但它在直接父级中:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM ratings a2 WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17' AND rating >=1
AND rating <=2 AND a2.page_title = a1.page_title
) AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17' AND DATE(a1.aa_timestamp) >= '2011-04-11'
GROUP BY a1.page_title HAVING COUNT(*) > 20
答案 0 :(得分:5)
我认为你可能会考虑加入两个在线视图,它可能会让事情变得更加容易。
SELECT *
FROM (SELECT COUNT(*),
a2.page_title
FROM ratings a2
WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17'
AND rating >= 1
AND rating <= 2
GROUP BY a2.page_title) current
JOIN
(SELECT a1.page_title,
COUNT(*) AS rvol,
AVG(a1.rating) AS theavg
FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17'
AND DATE(a1.a_timestamp) >= '2011-04-11'
GROUP BY a1.page_title
HAVING COUNT(*) > 20) morethan20
ON current .page_title = morethan20.page_title
答案 1 :(得分:1)
如果只有这一个简单的表,我不知道你从哪里拉出所有这些其他表名,例如:a1,a2,rating。我觉得你的SQL有点不对劲,或者你遗漏了信息。
您遇到错误的原因是因为在子子查询中您没有在“FROM”语句中包含a1 ...因为不包含表,所以不能在WHERE子句中引用它在那个子查询中。
SELECT *
FROM
(SELECT *
FROM a1
WHERE a1.timestamp <= (NOW()-604800)
AND a1.timestamp >= (NOW()-1209600)
GROUP BY a1.page_title
HAVING COUNT(a1.page_title)>20)
AS priorWeekCount
WHERE
rating <= 2
ORDER BY timestamp DESC
LIMIT 10
因为我没有一个完整的表来测试这个...我认为这是你正在寻找的...但它是未经测试的,并且知道我的编码习惯,很少是我第一次100%完美输入;)
答案 2 :(得分:1)
您对错误的分析是正确的:lowest
在子查询中是已知的,a1不是。
我认为逻辑是由内到外的。以下可能不是最好的,但优化器可能足够聪明,可以在最外层的SELECT中组合两个子查询。 (如果不是,则存在可读性风险,您可以引入另一级子查询。)
SELECT r20plus.page_title,
AVG((SELECT rating
FROM ratings r WHERE r.page_title=r20plus.page_title
ORDER BY timestamp DESC LIMIT 10) ) as av,
SUM((SELECT CASE WHEN rating BETWEEN 1 AND 2 THEN 1 ELSE 0 END
FROM ratings r WHERE r.page_title=r20plus.page_title
ORDER BY timestamp DESC LIMIT 10) ) as n_low,
FROM
(SELECT page_title FROM ratings
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY page_title
HAVING COUNT(rating) >= 20) AS r20plus;