MySQL子查询引用父查询中的字段

时间:2011-04-25 17:55:02

标签: mysql subquery

我正在构建一个对评级数据执行一些过滤的查询。

假设我有一个名为ratings的简单表,如下所示,存储来自在线评级工具的数据:

+----------------+----------------+--------+
| page_title     | timestamp      | rating |
+----------------+----------------+--------+
| Abc            | 20110417092134 | 1      |
| Abc            | 20110418110831 | 2      |
| Def            | 20110417092205 | 3      |
+----------------+----------------+--------+

我需要在最新的10个评级中提取具有高频率低值的页面,并将此查询限制为在前一周产生至少20个评级的页面。这是我提出的荒谬的长查询:

SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg, 
(
     SELECT COUNT(*) FROM
     (
         SELECT * FROM ratings a2 WHERE a2.page_title = a1.page_title 
         AND DATE(timestamp) <= '2011-04-24' ORDER BY timestamp DESC LIMIT 10
     ) 
     AS latest WHERE rating >=1 AND rating <=2 ORDER BY timestamp DESC
)
AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17" 
GROUP BY a1.page_title HAVING COUNT(*) > 20

顶级查询在2011-04-24结束的一周内查找超过20个评级的页面,子查询应该检索最新10个评级中值为[1,2]的评级数。来自顶级查询的每篇文章。

MySQL抱怨子查询的WHERE子句中的a1.page_title是一个未知列,我怀疑这是因为a1在第二级查询中没有被定义为别名,而只是在顶级查询中,但是我无法解决这个问题。

(编辑)的

我在上面添加了一个关于我的嫌疑人的解释,关于跨级别引用另一个非常正常的查询,请注意这里a1没有在子查询中定义,但它在直接父级中:

SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg, 
(
    SELECT COUNT(*) FROM ratings a2 WHERE DATE(timestamp) <= '2011-04-24'
    AND DATE(timestamp) >= '2011-04-17' AND rating >=1 
    AND rating <=2 AND a2.page_title = a1.page_title
) AS lowest FROM ratings a1 
WHERE DATE(a1.timestamp) <= '2011-04-17' AND DATE(a1.aa_timestamp) >= '2011-04-11' 
GROUP BY a1.page_title HAVING COUNT(*) > 20

3 个答案:

答案 0 :(得分:5)

我认为你可能会考虑加入两个在线视图,它可能会让事情变得更加容易。

SELECT * 
FROM   (SELECT COUNT(*), 
               a2.page_title 
        FROM   ratings a2 
        WHERE  DATE(timestamp) <= '2011-04-24' 
               AND DATE(timestamp) >= '2011-04-17' 
               AND rating >= 1 
               AND rating <= 2 

        GROUP  BY a2.page_title) current 
       JOIN 
        (SELECT a1.page_title, 
                    COUNT(*)       AS rvol, 
                    AVG(a1.rating) AS theavg 
             FROM   ratings a1 
             WHERE  DATE(a1.timestamp) <= '2011-04-17' 
                    AND DATE(a1.a_timestamp) >= '2011-04-11' 
             GROUP  BY a1.page_title 
             HAVING COUNT(*) > 20) morethan20 
         ON current .page_title = morethan20.page_title 

答案 1 :(得分:1)

如果只有这一个简单的表,我不知道你从哪里拉出所有这些其他表名,例如:a1,a2,rating。我觉得你的SQL有点不对劲,或者你遗漏了信息。

您遇到错误的原因是因为在子子查询中您没有在“FROM”语句中包含a1 ...因为不包含表,所以不能在WHERE子句中引用它在那个子查询中。

SELECT * 
FROM
    (SELECT *
        FROM a1
        WHERE a1.timestamp <= (NOW()-604800)
            AND a1.timestamp >= (NOW()-1209600)
        GROUP BY a1.page_title
        HAVING COUNT(a1.page_title)>20)
    AS priorWeekCount
WHERE
    rating <= 2
ORDER BY timestamp DESC
LIMIT 10

因为我没有一个完整的表来测试这个...我认为这是你正在寻找的...但它是未经测试的,并且知道我的编码习惯,很少是我第一次100%完美输入;)

答案 2 :(得分:1)

您对错误的分析是正确的:lowest在子查询中是已知的,a1不是。

我认为逻辑是由内到外的。以下可能不是最好的,但优化器可能足够聪明,可以在最外层的SELECT中组合两个子查询。 (如果不是,则存在可读性风险,您可以引入另一级子查询。)

SELECT r20plus.page_title,
 AVG((SELECT rating 
      FROM ratings r WHERE r.page_title=r20plus.page_title 
      ORDER BY timestamp DESC LIMIT 10) ) as av,
 SUM((SELECT CASE WHEN rating BETWEEN 1 AND 2 THEN 1 ELSE 0 END 
      FROM ratings r WHERE r.page_title=r20plus.page_title
      ORDER BY timestamp DESC LIMIT 10) ) as n_low,
FROM
(SELECT page_title FROM ratings  
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY page_title
HAVING COUNT(rating) >= 20) AS r20plus;