如何从单元格中提取唯一单词并对其进行计数

时间:2019-05-15 07:21:33

标签: sql oracle

我有一列"DESCRIPTION" (VARCHAR2 (500 Byte))

结果我想要两列。首先从每个单元格中提取唯一词,然后将其显示在一列中,然后计算它们的出现频率。

此外,我有限制参数"ENTRYDATE" (i.e. "WHERE ENTRYDATE BETWEEN 20180101 and 20190101").,因为表很大。

我在Excel中有一些解决方案,但是这样做很麻烦且痛苦。

使用SELECT在Oracle中甚至可以做到吗?

示例:

列数|解释

1 | roses are red violets are blue
2 | red violets 
3 | red
4 | roses
5 | blue

结果:

WORDS | COUNTING

roses | 2
are | 2
red | 3
violets | 2
blue | 2

查询变量:

with test as
      (select 1 as nor, 'roses are red violets are blue' as explanation from dual union all
       select 2 as nor, 'red violets' as explanation from dual union all
       select 3 as nor, 'red'  as explanation from dual union all
       select 4 as nor, 'roses'  as explanation from dual union all
       select 5 as nor, 'blue'   as explanation from dual
      ),
    temp as
      (select nor,
             trim(column_value) word
      from test join xmltable(('"' || replace(explanation, ' ', '","') ||'"')) on 1 = 1
     )
   select word,
          count(*)
   from temp
   group by word
   order by word;

返回ORA-00905:缺少关键字

3 个答案:

答案 0 :(得分:0)

将说明分为几行(以便获得个单词),然后对这些单词应用COUNT函数。

SQL> with test (nor, explanation) as
  2    (select 1, 'roses are red violets are blue' from dual union all
  3     select 2, 'red violets'                    from dual union all
  4     select 3, 'red'                            from dual union all
  5     select 4, 'roses'                          from dual union all
  6     select 5, 'blue'                           from dual
  7    ),
  8  temp as
  9    (select nor,
 10            regexp_substr(explanation, '[^ ]+', 1, column_value) word
 11     from test join table(cast(multiset(select level from dual
 12                                        connect by level <= regexp_count(explanation, ' ') + 1
 13                                       ) as sys.odcinumberlist)) on 1 = 1
 14    )
 15  select word,
 16         count(*)
 17  from temp
 18  group by word
 19  order by word;

WORD                             COUNT(*)
------------------------------ ----------
are                                     2
blue                                    2
red                                     3
roses                                   2
violets                                 2

SQL>

您提到了entrydate列,但示例数据中没有任何列,因此-如有必要,请将其包括在TEMP CTE中。

[编辑:呵呵,Oracle 9i ...回到黑暗时代]

看看是否有帮助;我希望能做到:

SQL> with test (nor, explanation) as
  2    (select 1, 'roses are red violets are blue' from dual union all
  3     select 2, 'red violets'                    from dual union all
  4     select 3, 'red'                            from dual union all
  5     select 4, 'roses'                          from dual union all
  6     select 5, 'blue'                           from dual
  7    ),
  8  temp as
  9    (select nor,
 10            trim(column_value) word
 11     from test join xmltable(('"' || replace(explanation, ' ', '","') ||'"')) on 1 = 1
 12    )
 13  select word,
 14         count(*)
 15  from temp
 16  group by word
 17  order by word;

WORD                   COUNT(*)
-------------------- ----------
are                           2
blue                          2
red                           3
roses                         2
violets                       2

SQL>

答案 1 :(得分:0)

-- Oracle 12c+
with test (nor, explanation) as (
select 1, 'roses are red violets are blue' from dual union all
select 2, 'red violets'                    from dual union all
select 3, 'red'                            from dual union all
select 4, 'roses'                          from dual union all
select 5, 'blue'                           from dual)
select regexp_substr(explanation, '\S+', 1, lvl) word, count(*) cnt
from test,
lateral(
select rownum lvl
from dual
connect by level <= regexp_count(explanation, '\S+')
)
group by regexp_substr(explanation, '\S+', 1, lvl);

WORD                                  CNT
------------------------------ ----------
roses                                   2
are                                     2
violets                                 2
red                                     3
blue                                    2

答案 2 :(得分:0)

问题出在您的旧Oracle版本中。此查询应该有效,它只有基本的connect byinstrdbms_random

select word, count(1) counting
  from (
    select id, trim(case pos2 when 0 then substr(description, pos1) 
                              else substr(description, pos1, pos2 - pos1) 
                    end) word
      from (
        select id, description, 
               case level when 1 then 1 else instr(description, ' ', 1, level - 1) end pos1, 
               instr(description, ' ', 1, level) pos2
          from t 
          connect by prior dbms_random.value is not null 
                 and prior id = id 
                 and level <= length(description) - length(replace(description, ' ', '')) + 1))
  group by word

demo