我在文章标题中使用了一个单词表。我想找到在集合或文章标题中使用最少的单词。
示例:
标题
"Congressman Joey of Texas does not sign bill C1234."
"The pretty blue bird flies at night in Texas."
"Congressman Bob of Arizona is the signs bill C1234."
该表格包含以下内容。
表WORDS_LIST
----------------------------------------------------
| INDEX ID | WORD | ARTICLE ID |
----------------------------------------------------
| 1 | CONGRESSMAN | 1234 |
| 2 | JOEY | 1234 |
| 3 | SIGN | 1234 |
| 4 | BILL | 1234 |
| 5 | C1234 | 1234 |
| 6 | TEXAS | 1234 |
| 7 | PRETTY | 1235 |
| 8 | BLUE | 1245 |
| 9 | BIRD | 1245 |
| 10 | FLIES | 1245 |
| 11 | NIGHT | 1245 |
| 12 | TEXAS | 1245 |
| 13 | CONGRESSMAN | 1246 |
| 14 | BOB | 1246 |
| 15 | ARIZONA | 1246 |
| 16 | SIGNS | 1246 |
| 17 | BILL | 1246 |
| 18 | C1234 | 1246 |
----------------------------------------------------
在这种情况下,“漂亮,蓝色,苍蝇,夜晚”这两个词将在最少的文章中使用。
我很感激有关如何最好地创建此查询的任何想法。到目前为止,下面是我的开始。我也可以在PHP中编写一些内容,但认为查询会更快。
SELECT distinct a1.`word`, count(a1.`word`)
FROM mmdb.words_list a1
JOIN mmdb.words_list b1
ON a1.id = b1.id AND
upper(a1.word) = upper(b1.word)
where date(a1.`publish_date`) = '2017-06-09'
group by `word`
order by count(a1.`word`);
答案 0 :(得分:4)
我不明白为什么需要自我加入。做这样的事情:
select wl.word, count(*)
from mmdb.words_list wl
where date(wl.`publish_date`) = '2017-06-09'
group by wl.word
order by count(*);
您可以添加limit
以获得固定数量的字词。如果publish_date
已经是日期,则应进行比较:
where publish_date = '2017-06-09'
如果它有时间成分:
where publish_date >= '2017-06-09' and publish_date < '2017-06-10'
此表达式允许MySQL使用索引。
答案 1 :(得分:0)
试试这个。它有点简单,应该返回正确的结果:
SELECT `WORD`,
COUNT(*) as `num_articles`
FROM `WORDS_LIST`
WHERE date(`publish_date`) = '2017-06-09'
GROUP BY `WORD`
ORDER BY COUNT(*) ASC;