我创建了一个表格,其中填充的是人们在查看照片时首先想到的回复。我有~1400个条目。现在,我想看看最常见的描述是什么。
CREATE TABLE descript (
wordID int NOT NULL AUTO_INCREMENT PRIMARY KEY,
wordText TEXT(50)
)
ENGINE=MyISAM;
INSERT INTO descript VALUES(0,"Big");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"fast");
INSERT INTO descript VALUES(0,"impressive");
INSERT INTO descript VALUES(0,"big");
INSERT INTO descript VALUES(0,"big");
INSERT INTO descript VALUES(0,"red");
INSERT INTO descript VALUES(0,"his");
INSERT INTO descript VALUES(0,"her");
INSERT INTO descript VALUES(0,"His");
INSERT INTO descript VALUES(0,"Black");
INSERT INTO descript VALUES(0,"black");
INSERT INTO descript VALUES(0,"black");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"a black");
INSERT INTO descript VALUES(0,"his");
INSERT INTO descript VALUES(0,"her");
INSERT INTO descript VALUES(0,"pleasant");
INSERT INTO descript VALUES(0,"the fast");
INSERT INTO descript VALUES(0,"blue");
以及之前和之后......
我必须这样做它是小写的,用这个来完成:
select LOWER(wordText) descript;
如何计算最常见的答案并显示它?我有一些停顿词(我不想被包含在计数中,例如' a"或者''。我如何不计算它们?
答案 0 :(得分:1)
基本查询是:
SELECT lower(wordText) as word, count(*)
FROM descript
GROUP BY lower(wordText)
ORDER BY count(*) DESC
LIMIT 1;
如果要在查询中包含停用词,可以使用not in
删除停用词:
SELECT lower(wordText) as word, count(*)
FROM descript
WHERE lower(wordText) not in ('a', 'the', . . . )
GROUP BY lower(wordText)
ORDER BY count(*) DESC
LIMIT 1;
或者,如果您将它们放在表格中:
SELECT lower(sw.wordText) as word, count(*)
FROM descript d left join
stopwords sw
on d.wordText = sw.word
WHERE sw.word is not null
GROUP BY lower(sw.wordText)
ORDER BY count(*) DESC
LIMIT 1;
您可以了解MySQL here中包含的停用词。
答案 1 :(得分:0)
如果你做了
SELECT COUNT(LOWER(wordText)) FROM descript GROUP BY LOWER(wordText);
你应该能够看到每个单词有多少。
您可以添加
ORDER BY
子句根据每个结果的计数来安排它们
答案 2 :(得分:0)
根据获取最常用的值,您可以使用此查询。
SELECT wordText, count(*) FROM descript GROUP BY wordText ORDER BY count(*) DESC LIMIT 1;