Question

假设我有一张表shortText：

ID    | SHORT_TEXT
------+---------------------------
001   | The elephants went in two by two
002   | Needles and haystack
003   | Somewhere over the rainbow
...

如何查询shortText来计算列SHORT_TEXT中每个单词的出现次数（不使用存储过程）来获得结果

WORD  | OCCURENCE
------+------------
the   | 2
and   | 1
over  | 1
...

编辑：

到目前为止，在SO中没有提供一般答案（没有给定最大值的可变数量的单词）。

Answer 1

我认为如果您在列上构建全文索引，则可以从创建的表中获取字以支持字符串的标记化。

这很好地解释了.... https://dev.mysql.com/doc/refman/5.6/en/innodb-ft-index-table-table.html

这是构建索引后的查询....

{
"idempotency_key": "1",
"billing_address": {
"address_line_1": "123 Main Street",
"address_line_2": "",
"locality": "New York City",
"administrative_district_level_1": "NY",
"postal_code": "12345",
"country": "US"
},
"amount_money": {
"amount": "10",
"currency": "USD"
},
"delay_capture": "false",
"buyer_email_address": "someone@example.com",
"card_nonce": "C257wsh4fggd1OWEQLTwU0MIdnA"
}

OR

SELECT word, doc_count, doc_id, position FROM INNODB_FT_INDEX_TABLE

等.....

注意，我没有对此进行测试，但我在Oracle中做过类似的事情。

Answer 2

理论上，你想要分割字符串＆＃34; shortText＆＃34;单独的单词（IE按空格分割字符串），然后将所有数组合并成一个巨大的列表并计算单词..我担心在MySQL中这可能要求太多，但是，我可以说明原理postgreSQL如下：

 select word,count(*) occurrence
  from
 (select 
   unnest(string_to_array(lower(short_text),' ')) word 
   from shortText) words
 group by words.word
 order by count(*) desc

Answer 3

我发现这很有趣，在特定列中统计字数：

SELECT SUM(LENGTH(`YourText`) -  LENGTH(REPLACE(`YourText`, ' ', '' )) +1) FROM `table_name` WHERE `ID`='1';

计算数据库列中的单词

3 个答案: