Question

我发现了很多类似的问题，但却无法理解/应用答案;而且我真的不知道要搜索什么...

我有2个表（ docs 和 words ），它们有很多关系。我正在尝试生成未出现在指定文档中的前5个最常用词的列表。

为此，我有2个mySQL查询，每个查询都是我实现目标的一部分：

查询＃1 - 返回按使用频率排序的字词，因为它也会返回所有字词SQLFiddle.com）

SELECT `words_idwords` as wdID, COUNT(*) as freq
    FROM docs_has_words 
    GROUP BY `words_idwords`
    ORDER BY  freq DESC, wdID ASC

查询＃2 - 返回指定文档中缺失的字词，因为它没有按使用频率排序（SQLFiddle.com）

SELECT wordscol as wrd, idwords as wID 
    FROM `words` where NOT `idwords` 
    IN (SELECT `words_idwords` FROM `docs_has_words` WHERE `docs_iddocs` = 1)

但我希望输出看起来像是：

idwords | wordscol | freq
-------------------------
| 8     | Dog      | 3  |
| 3     | Ape      | 2  |
| 4     | Bear     | 1  |
| 6     | Cat      | 1  |
| 7     | Cheetah  | 1  |
| 5     | Beaver   | 0  |




Note: `Dolphin`, one of the most frequently used words, is NOT in the 
      list because it is already in the document iddocs = 1

Note: `Beaver`, is a "never used word" BUT is in the list because it is
      in the main word list

问题是：如何将这些内容与查询相结合，或以其他方式获得所需的输出？

基本要求： - 3列输出 - 按使用频率排序的结果，即使使用为零

更新

根据一些评论，当我提出2个查询时，我想到的方法是：

步骤1 - 查找主要单词列表中但未在文档1中使用的所有单词

第2步 - 根据使用它们的文档数量对步骤1中的单词进行排名

一旦我有了2个查询，我认为很容易将它们与where子句结合起来，但我无法让它工作。

黑客解决方案可以基于添加包含所有单词的虚拟文档，然后从freq 中减去1（但我不是那么多的黑客！）。

Answer 1

我现在看到问题所在。我对你对第一个查询结果的陈述有误导（重点是我的）：

返回按使用频率排序的字词，因为它还会返回所有字词

此查询不返回所有单词，只返回所有已用字词。

因此，您需要left join words表上的docs_has_words表来获取所有单词并删除与doc 1相关联的单词：

SELECT w.idwords as wdID, w.wordscol, COUNT(d.words_idwords) as freq
    FROM words w
    LEFT JOIN `docs_has_words` d on w.idwords=d.words_idwords
    WHERE w.idwords not in (SELECT `words_idwords` FROM `docs_has_words` WHERE `docs_iddocs` = 1)
    GROUP BY w.idwords
    ORDER BY  freq DESC, wdID ASC;

请参阅sqlfiddle

Answer 2

我认为@Shadow在评论中说得对，你只需要添加这样的where子句：sqlFiddle

SELECT 
  `words_idwords` as wdID, 
  COUNT(*) as freq
FROM docs_has_words 
WHERE NOT `words_idwords` IN (SELECT `words_idwords` FROM `docs_has_words` WHERE `docs_iddocs` = 1)
GROUP BY `words_idwords`
ORDER BY  freq DESC, wdID ASC

这会产生您需要的输出吗？

来自2个语句的mySQL Multi Join

2 个答案: