表格中有一栏专栏' mytable'命名'描述'。
+----+-------------------------------+
| ID | Description |
+----+-------------------------------+
| 1 | My NAME is Sajid KHAN |
| 2 | My Name is Ahmed Khan |
| 3 | MY friend name is Salman Khan |
+----+-------------------------------+
我需要编写一个Oracle SQL查询/过程/函数来列出列中的不同单词。
输出应为:
+------------------+-------+
| Word | Count |
+------------------+-------+
| MY | 3 |
| NAME | 3 |
| IS | 3 |
| SAJID | 1 |
| KHAN | 3 |
| AHMED | 1 |
| FRIEND | 1 |
| SALMAN | 1 |
+------------------+-------+
单词匹配应该不区分大小写。
我使用的是Oracle 12.1。
答案 0 :(得分:1)
让我们假设我们会以某种方式设法将每个描述分成单词。 所以,而不是Id = 1和Description ='我的NAME是Sajid KHAN'的单行,而不是像这样的5行
ID | Description
--- | ------------
1 | My
1 | NAME
1 | is
1 | Sajid
1 | KHAN
以这种形式,它是微不足道的,类似于
select Description, count(*) from data_in_new_form group by Description
所以,让我们使用递归查询。
create table mytable
as
select 1 as ID, 'My NAME is Sajid KHAN' as Description from dual
union all
select 2, 'My Name is Ahmed Khan' from dual
union all
select 3, 'MY friend name is Salman Khan' from dual
union all
select 4, 'test, punctuation! it is' from dual
;
with
rec (id, str, depth, element_value) as
(
-- Anchor member.
select id, upper(Description) as str, 1 as depth, REGEXP_SUBSTR( upper(Description), '(.*?)( |$)', 1, 1, NULL, 1 ) AS element_value
from mytable
UNION ALL
-- Recursive member.
select id, str, depth + 1, REGEXP_SUBSTR( str ,'(.*?)( |$)', 1, depth+1, NULL, 1 ) AS element_value
from rec
where depth < regexp_count(str, ' ')+1
)
, data as (
select * from rec
--order by id, depth
)
select element_value, count(*) from data
group by element_value
order by element_value
;
请注意,如果单词用空格分隔,此版本对标点符号不做任何操作。
UPDATE 使用分层查询的替代方式
with rec as
(
SELECT id, LEVEL AS depth,
REGEXP_SUBSTR( upper(description) ,'(.*?)( |$)', 1, LEVEL, NULL, 1 ) AS element_value
FROM mytable
CONNECT BY LEVEL <= regexp_count(description, ' ')+1
and prior id = id
and prior SYS_GUID() is not null
)
, data as (
select * from rec
--order by id, depth
)
select element_value, count(*) from data
group by element_value
order by 2 desc
;
答案 1 :(得分:0)
此查询将起作用。单词的顺序可能不同。但是,如你所列的那样,一开始就会出现频繁的单词。
SELECT word,
COUNT(*)
FROM
(SELECT TRIM (REGEXP_SUBSTR (Description, '[^ ]+', 1, ROWNUM) ) AS Word
FROM
(SELECT LISTAGG(UPPER(Description),' ') within GROUP(
ORDER BY ROWNUM ) AS Description
FROM mytable
)
CONNECT BY LEVEL <= REGEXP_COUNT ( Description, '[^ ]+')
)
GROUP BY WORD
ORDER BY 2 DESC;