我差不多有200字。我想看看这些单词出现在表格列中的次数。
例如:假设我们对包含两行的列语句进行了表测试。
现在我想找到“你”和“怎么样”这两个词的出现。输出应该是这样的:
word count
you 3
how 2
因为“你”有3个,两行中有2个出现。
我该怎么做?
答案 0 :(得分:0)
你可以这样做:
select
。答案 1 :(得分:0)
我接近这个的方法是写一点user defined function来给我一个字符串出现在另一个字符串中的次数,但有一些允许:
然后我会创建一个包含我想要搜索的所有单词的表格,即您的200个列表。然后使用该函数计算每个短语中每个单词的出现次数,将其放入内联视图中,然后按搜索词对结果求和。
因此:
用户定义的功能
DELIMITER $$
CREATE FUNCTION `get_word_count`(phrase VARCHAR(500),word VARCHAR(255), delimiter VARCHAR(1)) RETURNS int(11)
READS SQL DATA
BEGIN
DECLARE cur_position INT DEFAULT 1 ;
DECLARE remainder TEXT;
DECLARE cur_string VARCHAR(255);
DECLARE delimiter_length TINYINT UNSIGNED;
DECLARE total INT;
DECLARE result DOUBLE DEFAULT 0;
DECLARE string2 VARCHAR(255);
SET remainder = replace(phrase,'!',' ');
SET remainder = replace(remainder,'.',' ');
SET remainder = replace(remainder,',',' ');
SET remainder = replace(remainder,'?',' ');
SET remainder = replace(remainder,':',' ');
SET remainder = replace(remainder,'(',' ');
SET remainder = lower(remainder);
SET string2 = concat(delimiter,trim(word),delimiter);
SET delimiter_length = CHAR_LENGTH(delimiter);
SET cur_position = 1;
WHILE CHAR_LENGTH(remainder) > 0 AND cur_position > 0 DO
SET cur_position = INSTR(remainder, delimiter);
IF cur_position = 0 THEN
SET cur_string = remainder;
ELSE
SET cur_string = concat(delimiter,LEFT(remainder, cur_position - 1),delimiter);
END IF;
IF TRIM(cur_string) != '' THEN
set result = result + (select instr(string2,cur_string) > 0);
END IF;
SET remainder = SUBSTRING(remainder, cur_position + delimiter_length);
END WHILE;
RETURN result;
END$$
DELIMITER ;
您可能需要稍微使用此功能,具体取决于您需要为标点符号和大小写做出哪些限制。希望你能在这里得到这个想法!
填充表格
create table search_word
(id int unsigned primary key auto_increment,
word varchar(250) not null
);
insert into search_word (word) values ('you');
insert into search_word (word) values ('how');
insert into search_word (word) values ('to');
insert into search_word (word) values ('too');
insert into search_word (word) values ('the');
insert into search_word (word) values ('and');
insert into search_word (word) values ('world');
insert into search_word (word) values ('hello');
create table phrase_to_search
(id int unsigned primary key auto_increment,
phrase varchar(500) not null
);
insert into phrase_to_search (phrase) values ("How are you. It's been long since I met you");
insert into phrase_to_search (phrase) values ("I am fine how are you?");
insert into phrase_to_search (phrase) values ("Oh. Not bad. All is ok with the world, I think");
insert into phrase_to_search (phrase) values ("I think so too!");
insert into phrase_to_search (phrase) values ("You know what? I think so too!");
运行查询
select word,sum(word_count) as total_word_count
from
(
select phrase,word,get_word_count(phrase,word," ") as word_count
from search_word
join phrase_to_search
) t
group by word
order by total_word_count desc;
答案 2 :(得分:0)
这是一个解决方案:
SELECT SUM(total_count) as total, value
FROM (
SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
FROM table_name t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
ORDER BY value
) AS x
GROUP BY x.value
) AS y
GROUP BY value
以下是完整的工作小提琴:http://sqlfiddle.com/#!2/17481a/1
首先,我们通过@peterm进行查询以提取所有单词here(如果要自定义处理的单词总数,请按照他的说明操作)。然后我们将其转换为子查询,然后我们COUNT
和GROUP BY
每个单词的值,然后在GROUP BY
之上进行另一个查询,而不是分组的单词可能存在迹象。即:你好=你好!使用REPLACE
答案 3 :(得分:-1)
以下是您需要计算某些单词出现次数的简单解决方案,而不是完整的统计数据:
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%how%';
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%you%';