编写复杂的MySQL查询

时间:2014-10-30 14:16:06

标签: mysql sql database select join

注意:您可以在此处找到我之前的问题及其答案 - MySQL: Writing a complex query


我有3张桌子。

Words_Learned包含用户已知的所有单词,以及单词学习的顺序。它有3列1)单词ID和2)用户ID和3)学习单词的顺序。

Article包含文章。它有3列1)文章ID,2)唯一字数和3)文章内容。

Words包含每篇文章中包含的所有唯一单词的列表。它有2列1)字ID和2)文章ID

数据库图如下/

enter image description here

您可以从此处下载数据库代码:https://www.dropbox.com/s/3gr659y5mk05i5w/tests.sql?dl=0

现在,使用这个数据库并使用“only”mysql,我需要做以下工作。

给定一个用户ID,它应该得到该用户已知的所有单词的列表,按照学习它们的顺序排序。换句话说,最近学过的单词将位于列表的顶部。

假设对用户ID的查询显示他们已经记住了以下3个单词,并且我们跟踪他们学习单词的顺序。 八达通 - 3 狗 - 2 勺子 - 1

首先,我们获得包含Octopus一词的所有文章的列表,然后使用表Words对这些文章进行计算。计算意味着如果该文章包含超过10个未出现在用户词汇表列表中的单词(从表words_learned中提取),则它将从列表中排除。

然后,我们查询包含dog的所有记录,但不要包含“章鱼”

然后,我们查询包含勺子的所有记录,但不要包含八达通或狗的字样

在我们找到符合此条件的100条记录之前,您一直在执行此重复过程。

为了实现这个过程,我做了以下

SELECT `words_learned`.`idwords`,
Words.`idArticle`
FROM words_learned
INNER JOIN Words ON Words.idWords = Words_Learned.`idwords`
WHERE words_learned.userId = 1
ORDER BY Words_Learned.`order` DESC

在我的查询中,我已经介绍了获取文章,这意味着在这里 - First we get a list of all articles containing the word Octopus, and then do the calculation using table Words on just those articles.。但是我需要做些什么来覆盖其余的呢?

更新

这是一个更好理解的博士。

Do while articles found < 100
{
 for each ($X as known words, in order that those words were learned)
 {
  Select all articles that contain the word $X, where the 1) article has not been included in any previous loops, and 2)where the count of "unknown" words is less than 10. 

  Keep these articles in order. 
 }
}

3 个答案:

答案 0 :(得分:1)

所以,我认为就是这样。你想得到“最好的”100篇文章,其中“最好”意味着后来学到的一个词包含的越好。所以我查找每篇文章的最后学习单词(每篇文章的max(words_learned.order))。然后我按顺序显示文章ID并停在100。

select w.idarticle, max(l.`order`)
from words w
join words_learned l on l.idwords = w.idwords and l.userid = 123
group by w.idarticle
order by max(l.`order`) desc
limit 100;

您已编辑了您的请求。您希望将结果限制为包含不超过十个未知单词的文章。为此,您现在必须外部加入学习单词,这样您就可以计算未知单词(即外连接记录)。使用HAVING从列表中删除不需要的文章。

select w.idarticle, max(l.`order`)
from words w
left join words_learned l on l.idwords = w.idwords and l.iduser = 123
group by w.idarticle
having sum(l.idwords is null) <= 10 and max(l.`order`) is not null
order by max(l.`order`) desc
limit 100;

答案 1 :(得分:1)

我很想得到一个子查询,它可以获取一个人所学过的所有单词并将其与自身联系起来,并将GROUP_CONCAT一起与计数结合起来。所以给予: -

Octopus, NULL, 0
Dog, "Octopus", 1
Spoon, "Octopus,Dog", 2

因此子查询将类似于: -

SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
FROM words_learned sub0
LEFT OUTER JOIN words_learned sub1
ON sub0.userId = sub1.userId
AND sub0.order_learned < sub1.order_learned
WHERE sub0.userId = 1
GROUP BY sub0.idwords

给予

idwords    excl_words    older_words_cnt
1          NULL          0
2          1             1
3          1,2           2

然后将其结果与其他表联系起来,检查主要idwords匹配但没有找到其他表的文章。

像这样的东西(虽然没有测试为没有测试数据): -

SELECT sub_words.idwords, words_inc.idArticle
(
    SELECT sub0.idwords, SUBSTRING_INDEX(GROUP_CONCAT(sub1.idwords), ',', 10) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
    FROM words_learned sub0
    LEFT OUTER JOIN words_learned sub1
    ON sub0.userId = sub1.userId
    AND sub0.order_learned < sub1.order_learned
    WHERE sub0.userId = 1
    GROUP BY sub0.idwords
) sub_words
INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords
LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100 

编辑 - 更新以排除超过10个尚未学习的文章。

SELECT sub_words.idwords, words_inc.idArticle,
sub2.idArticle, sub2.count, sub2.content
FROM
(
    SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
    FROM words_learned sub0
    LEFT OUTER JOIN words_learned sub1
    ON sub0.userId = sub1.userId
    AND sub0.order_learned < sub1.order_learned
    WHERE sub0.userId = 1
    GROUP BY sub0.idwords
) sub_words 
INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords
INNER JOIN
(
    SELECT a.idArticle, a.count, a.content, SUM(IF(c.idwords_learned IS NULL, 1, 0)) AS unlearned_words_count
    FROM Article a
    INNER JOIN words b
    ON a.idArticle = b.idArticle
    LEFT OUTER JOIN words_learned c
    ON b.idwords = c.idwords
    AND c.userId = 1
    GROUP BY a.idArticle, a.count, a.content
    HAVING unlearned_words_count < 10
) sub2
ON words_inc.idArticle = sub2.idArticle
LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100

编辑 - 尝试评论上述查询: -

这只是选择列

SELECT sub_words.idwords, words_inc.idArticle,
sub2.idArticle, sub2.count, sub2.content
FROM

此子查询获取每个学习的单词,以及带有较大order_learned的单词的逗号分隔列表。这是针对特定用户ID的

(
    SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
    FROM words_learned sub0
    LEFT OUTER JOIN words_learned sub1
    ON sub0.userId = sub1.userId
    AND sub0.order_learned < sub1.order_learned
    WHERE sub0.userId = 1
    GROUP BY sub0.idwords
) sub_words 

这只是为了获取文章(即从上面的子查询中学习的单词)用于

INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords

此子查询获取特定用户尚未学习的文章少于10个单词。

INNER JOIN
(
    SELECT a.idArticle, a.count, a.content, SUM(IF(c.idwords_learned IS NULL, 1, 0)) AS unlearned_words_count
    FROM Article a
    INNER JOIN words b
    ON a.idArticle = b.idArticle
    LEFT OUTER JOIN words_learned c
    ON b.idwords = c.idwords
    AND c.userId = 1
    GROUP BY a.idArticle, a.count, a.content
    HAVING unlearned_words_count < 10
) sub2
ON words_inc.idArticle = sub2.idArticle

此连接用于查找在第一个子查询的逗号分页列表中包含单词的文章(即具有较大order_learned的单词)。这是作为LEFT OUTER JOIN完成的,因为我想要排除找到的任何单词(这是通过检查NULL在WHERE子句中完成的)

LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100

答案 2 :(得分:1)

我再次阅读了这个问题并注意到它要复杂得多。

首先,你要展示单词。而且你是否显示一个单词取决于该单词和所有之前学过的单词(以及它们出现的文章)。

所以学到了这些话:

word      order
Octopus   3
Dog       2
Spoon     1 (i.e.first learned)

这些文章:

article  contains Octopus   contains Dog   contains spoon   unknown words
A             yes               yes             yes              5
B             yes               yes             no              11
C             yes               no              yes             15
D             no                yes             yes              2
E             no                yes             no               0
F             no                no              yes              8
G             no                no              no               3
H             no                no              no              20

你...

  • 检查“Octupus”并因文章B或C而将其解雇。
  • 检查“狗”并保留,因为文章D和E都可以,B必须被忽略(因为它包含“八达通”)。
  • 检查“勺子”并保留,文章F没问题,必须忽略C(因为它包含“八达通”)。

所以你展示“狗”和“勺子”而不是“章鱼”。如果不只有两场比赛,而是千场比赛,那么你将展示前100场,然后停下来。

鉴于此算法,我们可以得出结论:

  • 只要学习的单词太少,就不会显示任何结果。
  • 在某些时候,将学会足够的单词来查找少于11个未知单词的文章。可能不会显示最后学过的单词(在示例中为“八达通”),因为仍有许多文章包含太多未知单词。但是会显示较早的单词(因为最后学过的单词会过滤掉难以阅读的文章)。
  • 然后有一天,大部分的话会被学习。然后是最后学到的单词将会显示出来。

查询:

select idwords
from words_learned
where userid = 123
and not exists
(
  select w.idarticle
  from words w
  left join words_learned l on l.idwords = w.idwords and l.userid = 123
  group by w.idarticle
  having sum(l.idwords is null) > 10 and max(l.`order`) = words_learned.`order`
)
order by `order` desc
limit 100;

这是一个SQL小提琴:http://sqlfiddle.com/#!2/19bf8/1