Question

注意：您可以在此处找到我之前的问题及其答案 - MySQL: Writing a complex query

我有3张桌子。

表Words_Learned包含用户已知的所有单词，以及单词学习的顺序。它有3列1）单词ID和2）用户ID和3）学习单词的顺序。

表Article包含文章。它有3列1）文章ID，2）唯一字数和3）文章内容。

表Words包含每篇文章中包含的所有唯一单词的列表。它有2列1）字ID和2）文章ID

数据库图如下/

enter image description here

您可以从此处下载数据库代码：https://www.dropbox.com/s/3gr659y5mk05i5w/tests.sql?dl=0

现在，使用这个数据库并使用“only”mysql，我需要做以下工作。

给定一个用户ID，它应该得到该用户已知的所有单词的列表，按照学习它们的顺序排序。换句话说，最近学过的单词将位于列表的顶部。

假设对用户ID的查询显示他们已经记住了以下3个单词，并且我们跟踪他们学习单词的顺序。八达通 - 3 狗 - 2 勺子 - 1

首先，我们获得包含Octopus一词的所有文章的列表，然后使用表Words对这些文章进行计算。计算意味着如果该文章包含超过10个未出现在用户词汇表列表中的单词（从表words_learned中提取），则它将从列表中排除。

然后，我们查询包含dog的所有记录，但不要包含“章鱼”

然后，我们查询包含勺子的所有记录，但不要包含八达通或狗的字样

在我们找到符合此条件的100条记录之前，您一直在执行此重复过程。

为了实现这个过程，我做了以下

SELECT `words_learned`.`idwords`,
Words.`idArticle`
FROM words_learned
INNER JOIN Words ON Words.idWords = Words_Learned.`idwords`
WHERE words_learned.userId = 1
ORDER BY Words_Learned.`order` DESC

在我的查询中，我已经介绍了获取文章，这意味着在这里 - First we get a list of all articles containing the word Octopus, and then do the calculation using table Words on just those articles.。但是我需要做些什么来覆盖其余的呢？

更新

这是一个更好理解的博士。

Do while articles found < 100
{
 for each ($X as known words, in order that those words were learned)
 {
  Select all articles that contain the word $X, where the 1) article has not been included in any previous loops, and 2)where the count of "unknown" words is less than 10. 

  Keep these articles in order. 
 }
}

Answer 1

所以，我认为就是这样。你想得到“最好的”100篇文章，其中“最好”意味着后来学到的一个词包含的越好。所以我查找每篇文章的最后学习单词（每篇文章的max（words_learned.order））。然后我按顺序显示文章ID并停在100。

select w.idarticle, max(l.`order`)
from words w
join words_learned l on l.idwords = w.idwords and l.userid = 123
group by w.idarticle
order by max(l.`order`) desc
limit 100;

您已编辑了您的请求。您希望将结果限制为包含不超过十个未知单词的文章。为此，您现在必须外部加入学习单词，这样您就可以计算未知单词（即外连接记录）。使用HAVING从列表中删除不需要的文章。

select w.idarticle, max(l.`order`)
from words w
left join words_learned l on l.idwords = w.idwords and l.iduser = 123
group by w.idarticle
having sum(l.idwords is null) <= 10 and max(l.`order`) is not null
order by max(l.`order`) desc
limit 100;

Answer 2

我很想得到一个子查询，它可以获取一个人所学过的所有单词并将其与自身联系起来，并将GROUP_CONCAT一起与计数结合起来。所以给予： -

Octopus, NULL, 0
Dog, "Octopus", 1
Spoon, "Octopus,Dog", 2

因此子查询将类似于： -

SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
FROM words_learned sub0
LEFT OUTER JOIN words_learned sub1
ON sub0.userId = sub1.userId
AND sub0.order_learned < sub1.order_learned
WHERE sub0.userId = 1
GROUP BY sub0.idwords

给予

idwords    excl_words    older_words_cnt
1          NULL          0
2          1             1
3          1,2           2

然后将其结果与其他表联系起来，检查主要idwords匹配但没有找到其他表的文章。

像这样的东西（虽然没有测试为没有测试数据）： -

SELECT sub_words.idwords, words_inc.idArticle
(
    SELECT sub0.idwords, SUBSTRING_INDEX(GROUP_CONCAT(sub1.idwords), ',', 10) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
    FROM words_learned sub0
    LEFT OUTER JOIN words_learned sub1
    ON sub0.userId = sub1.userId
    AND sub0.order_learned < sub1.order_learned
    WHERE sub0.userId = 1
    GROUP BY sub0.idwords
) sub_words
INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords
LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100

编辑 - 更新以排除超过10个尚未学习的文章。

SELECT sub_words.idwords, words_inc.idArticle,
sub2.idArticle, sub2.count, sub2.content
FROM
(
    SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
    FROM words_learned sub0
    LEFT OUTER JOIN words_learned sub1
    ON sub0.userId = sub1.userId
    AND sub0.order_learned < sub1.order_learned
    WHERE sub0.userId = 1
    GROUP BY sub0.idwords
) sub_words 
INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords
INNER JOIN
(
    SELECT a.idArticle, a.count, a.content, SUM(IF(c.idwords_learned IS NULL, 1, 0)) AS unlearned_words_count
    FROM Article a
    INNER JOIN words b
    ON a.idArticle = b.idArticle
    LEFT OUTER JOIN words_learned c
    ON b.idwords = c.idwords
    AND c.userId = 1
    GROUP BY a.idArticle, a.count, a.content
    HAVING unlearned_words_count < 10
) sub2
ON words_inc.idArticle = sub2.idArticle
LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100

编辑 - 尝试评论上述查询： -

这只是选择列

SELECT sub_words.idwords, words_inc.idArticle,
sub2.idArticle, sub2.count, sub2.content
FROM

此子查询获取每个学习的单词，以及带有较大order_learned的单词的逗号分隔列表。这是针对特定用户ID的

(
    SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
    FROM words_learned sub0
    LEFT OUTER JOIN words_learned sub1
    ON sub0.userId = sub1.userId
    AND sub0.order_learned < sub1.order_learned
    WHERE sub0.userId = 1
    GROUP BY sub0.idwords
) sub_words

这只是为了获取文章（即从上面的子查询中学习的单词）用于

INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords

此子查询获取特定用户尚未学习的文章少于10个单词。

INNER JOIN
(
    SELECT a.idArticle, a.count, a.content, SUM(IF(c.idwords_learned IS NULL, 1, 0)) AS unlearned_words_count
    FROM Article a
    INNER JOIN words b
    ON a.idArticle = b.idArticle
    LEFT OUTER JOIN words_learned c
    ON b.idwords = c.idwords
    AND c.userId = 1
    GROUP BY a.idArticle, a.count, a.content
    HAVING unlearned_words_count < 10
) sub2
ON words_inc.idArticle = sub2.idArticle

此连接用于查找在第一个子查询的逗号分页列表中包含单词的文章（即具有较大order_learned的单词）。这是作为LEFT OUTER JOIN完成的，因为我想要排除找到的任何单词（这是通过检查NULL在WHERE子句中完成的）

LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100

Answer 3

我再次阅读了这个问题并注意到它要复杂得多。

首先，你要展示单词。而且你是否显示一个单词取决于该单词和所有之前学过的单词（以及它们出现的文章）。

所以学到了这些话：

word      order
Octopus   3
Dog       2
Spoon     1 (i.e.first learned)

这些文章：

article  contains Octopus   contains Dog   contains spoon   unknown words
A             yes               yes             yes              5
B             yes               yes             no              11
C             yes               no              yes             15
D             no                yes             yes              2
E             no                yes             no               0
F             no                no              yes              8
G             no                no              no               3
H             no                no              no              20

你...

检查“Octupus”并因文章B或C而将其解雇。
检查“狗”并保留，因为文章D和E都可以，B必须被忽略（因为它包含“八达通”）。
检查“勺子”并保留，文章F没问题，必须忽略C（因为它包含“八达通”）。

所以你展示“狗”和“勺子”而不是“章鱼”。如果不只有两场比赛，而是千场比赛，那么你将展示前100场，然后停下来。

鉴于此算法，我们可以得出结论：

只要学习的单词太少，就不会显示任何结果。
在某些时候，将学会足够的单词来查找少于11个未知单词的文章。可能不会显示最后学过的单词（在示例中为“八达通”），因为仍有许多文章包含太多未知单词。但是会显示较早的单词（因为最后学过的单词会过滤掉难以阅读的文章）。
然后有一天，大部分的话会被学习。然后是最后学到的单词将会显示出来。

查询：

select idwords
from words_learned
where userid = 123
and not exists
(
  select w.idarticle
  from words w
  left join words_learned l on l.idwords = w.idwords and l.userid = 123
  group by w.idarticle
  having sum(l.idwords is null) > 10 and max(l.`order`) = words_learned.`order`
)
order by `order` desc
limit 100;

这是一个SQL小提琴：http://sqlfiddle.com/#!2/19bf8/1。

编写复杂的MySQL查询

3 个答案: