假设我有一张表shortText
:
ID | SHORT_TEXT
------+---------------------------
001 | The elephants went in two by two
002 | Needles and haystack
003 | Somewhere over the rainbow
...
如何查询shortText
来计算列SHORT_TEXT
中每个单词的出现次数(不使用存储过程)来获得结果
WORD | OCCURENCE
------+------------
the | 2
and | 1
over | 1
...
编辑:
到目前为止,在SO中没有提供一般答案(没有给定最大值的可变数量的单词)。
答案 0 :(得分:2)
我认为如果您在列上构建全文索引,则可以从创建的表中获取字以支持字符串的标记化。
这很好地解释了.... https://dev.mysql.com/doc/refman/5.6/en/innodb-ft-index-table-table.html
这是构建索引后的查询....
{
"idempotency_key": "1",
"billing_address": {
"address_line_1": "123 Main Street",
"address_line_2": "",
"locality": "New York City",
"administrative_district_level_1": "NY",
"postal_code": "12345",
"country": "US"
},
"amount_money": {
"amount": "10",
"currency": "USD"
},
"delay_capture": "false",
"buyer_email_address": "someone@example.com",
"card_nonce": "C257wsh4fggd1OWEQLTwU0MIdnA"
}
OR
SELECT word, doc_count, doc_id, position FROM INNODB_FT_INDEX_TABLE
等.....
注意,我没有对此进行测试,但我在Oracle中做过类似的事情。
答案 1 :(得分:1)
理论上,你想要分割字符串" shortText"单独的单词(IE按空格分割字符串),然后将所有数组合并成一个巨大的列表并计算单词..我担心在MySQL中这可能要求太多,但是,我可以说明原理postgreSQL如下:
select word,count(*) occurrence
from
(select
unnest(string_to_array(lower(short_text),' ')) word
from shortText) words
group by words.word
order by count(*) desc
答案 2 :(得分:0)
我发现这很有趣,在特定列中统计字数:
SELECT SUM(LENGTH(`YourText`) - LENGTH(REPLACE(`YourText`, ' ', '' )) +1) FROM `table_name` WHERE `ID`='1';