MySQL每组最大的麻烦

时间:2010-10-06 04:52:40

标签: sql mysql database greatest-n-per-group

嘿大家好我相信这是一个“每组最大的问题”,但即使在查看StackOverflow上的几个问题后,我也不确定如何将其应用于我的情况......

我正在使用MySQL数据库,并且有一个关于计算机应用程序的基本博客类型系统......表格如下所示:

POSTS
post_id
post_created
post_type      -- could be article, review, feature, whatever
post_status    -- 'a' approved or 'd' for draft

APPS
app_id 
app_name
app_platform   -- Windows, linux, unix, etc..

APP_TO_POST    -- links my posts to its relevant application
atp_id
atp_app_id
atp_post_id

我正在使用以下基本查询来为应用程序提取名称为“Photoshop”的所有文章,其中帖子类型为“文章”,文章的状态为“a”表示已批准:

SELECT apps.app_name, apps.app_platform, posts.post_created, posts.post_id
FROM apps
JOIN app_to_post ON app_to_post.atp_app_id = apps.app_id
JOIN posts ON app_to_post.atp_post_id = posts.post_id
WHERE apps.app_name = 'Photoshop'
AND 
posts.post_type = 'Article'
AND
posts.post_status = 'a'

这给了我这些预期的结果:

app_name    app_platform   post_created      post_id
Photoshop   Windows        Oct. 20th, 2009   1
Photoshop   Windows        Dec. 1, 2009      3
Photoshop   Macintosh      Nov. 10th, 2009   2

是否有人能够帮助我改变查询,以便只针对每个应用程序平台提取最新文章?例如,我希望我的结果看起来像这样:

app_name    app_platform   post_created      post_id
Photoshop   Windows        Dec. 1, 2009      3
Photoshop   Macintosh      Nov. 10th, 2009   2

并省略其中一篇'Photoshop Windows'文章,因为它不是最新的文章。

如果我只是点击MAX(post_created)GROUP BY app_platform我的结果并不总是正确分组。根据我的理解,我需要执行某种子查询的内部联接?

3 个答案:

答案 0 :(得分:4)

由于您有足够的JOIN次,我建议先创建一个VIEW

CREATE VIEW articles AS
    SELECT    a.app_name, a.app_platform, p.post_created, p.post_id
    FROM      apps a
    JOIN      app_to_post ap ON ap.atp_app_id = a.app_id
    JOIN      posts p ON ap.atp_post_id = p.post_id
    WHERE     p.post_type = 'Article' AND p.post_status = 'a';

然后你可以使用NULL-self-join:

SELECT     a1.app_name, a1.app_platform, a1.post_created, a1.post_id
FROM       articles a1
LEFT JOIN  articles a2 ON 
           a2.app_platform = a1.app_platform AND a2.post_created > a1.post_created
WHERE      a2.post_id IS NULL;

测试用例:

CREATE TABLE posts (
   post_id          int,
   post_created     datetime,
   post_type        varchar(30),
   post_status      char(1)
);

CREATE TABLE apps (
   app_id           int,
   app_name         varchar(40),
   app_platform     varchar(40)
);

CREATE TABLE app_to_post (
   atp_id           int,
   atp_app_id       int,
   atp_post_id      int
);

INSERT INTO posts VALUES (1, '2010-10-06 05:00:00', 'Article', 'a');
INSERT INTO posts VALUES (2, '2010-10-06 06:00:00', 'Article', 'a');
INSERT INTO posts VALUES (3, '2010-10-06 07:00:00', 'Article', 'a');
INSERT INTO posts VALUES (4, '2010-10-06 08:00:00', 'Article', 'a');
INSERT INTO posts VALUES (5, '2010-10-06 09:00:00', 'Article', 'a');

INSERT INTO apps VALUES (1, 'Photoshop', 'Windows');
INSERT INTO apps VALUES (2, 'Photoshop', 'Macintosh');

INSERT INTO app_to_post VALUES (1, 1, 1);
INSERT INTO app_to_post VALUES (1, 1, 2);
INSERT INTO app_to_post VALUES (1, 2, 3);
INSERT INTO app_to_post VALUES (1, 2, 4);
INSERT INTO app_to_post VALUES (1, 1, 5);

结果:

+-----------+--------------+---------------------+---------+
| app_name  | app_platform | post_created        | post_id |
+-----------+--------------+---------------------+---------+
| Photoshop | Macintosh    | 2010-10-06 08:00:00 |       4 |
| Photoshop | Windows      | 2010-10-06 09:00:00 |       5 |
+-----------+--------------+---------------------+---------+
2 rows in set (0.00 sec)

作为旁注,一般来说,surrogate key不需要junction table。您也可以设置复合主键(理想情况下是引用表的外键):

CREATE TABLE app_to_post (
   atp_app_id       int,
   atp_post_id      int,
   PRIMARY KEY (atp_app_id, atp_post_id),
   FOREIGN KEY (atp_app_id) REFERENCES apps (app_id),
   FOREIGN KEY (atp_post_id) REFERENCES posts (post_id)
) ENGINE=INNODB;

答案 1 :(得分:3)

让我们首先考虑如何从查询结果和您想要的结果中获取具有最大值的行:

您的结果:(我们称之为表T)

app_name    app_platform   post_created      post_id
Photoshop   Windows        Oct. 20th, 2009   1
Photoshop   Windows        Dec. 1, 2009      3
Photoshop   Macintosh      Nov. 10th, 2009   2

您想要的结果:

app_name    app_platform   post_created      post_id
Photoshop   Windows        Dec. 1, 2009      3
Photoshop   Macintosh      Nov. 10th, 2009   2

为了得到结果,你应该:

  1. 计算表T的每个平台的最大post_id。
  2. 使用原始表T加入最大结果,以获取该行的其他列中的值。
  3. 查询如下:

    SELECT
      t1.app_name,t1.app_platform,t1.post_created,t1.post_id
    FROM
      (SELECT app_platform, MAX(post_created) As MaxPostCreated
       FROM T
       GROUP BY app_platform) AS t2 JOIN 
      T AS t1
    WHERE
      t1.app_platform = t2.app_platform1
       AND t2.MaxPostCreated = t1.post_created
    

    在此查询中,子查询执行第一步,而join执行第二步。

    结合您的部分答案的最终结果如下所示(带有视图):

    CREATE VIEW T 
        SELECT    a.app_name, a.app_platform, p.post_created, p.post_id
        FROM      apps a
        JOIN      app_to_post ap ON ap.atp_app_id = a.app_id
        JOIN      posts p ON ap.atp_post_id = p.post_id
        WHERE     p.post_type = 'Article' AND p.post_status = 'a';
    
    SELECT
      t1.app_name,t1.app_platform,t1.post_created,t1.post_id
    FROM
      (SELECT app_platform, MAX(post_created) As MaxPostCreated
       FROM T
       GROUP BY app_platform) AS t2 JOIN 
      T AS t1
    WHERE
      t1.app_platform = t2.app_platform1
       AND t2.MaxPostCreated= t1.post_created
    

    顺便说一句,我们的团队实际上正在开发一个试图自动帮助用户编写查询的工具,用户可以向工具提供输入输出示例,该工具将生成查询。 (查询的第一部分实际上是由工具生成的!我们原型的链接是https://github.com/Mestway/Scythe

    希望这可以帮到你。 :)

答案 2 :(得分:0)

你走在正确的轨道上。

尝试添加

group by app_name,app_platform
having post_created=max(post_created)

或者,如果您的post_id是连续的,其中较高的值将始终反映较晚的帖子,请使用以下条款:having post_id=max(post_id)