查找字符串中出现次数最多的字符

时间:2015-02-23 13:53:06

标签: sql oracle

使用Oracle SQL查询,我们可以执行以下操作吗?

      Input       Output
    'aaaabcd' --->  'a'
    '0001001' --->  '0'

也就是说,找到字符串中出现次数最多的字符?

4 个答案:

答案 0 :(得分:6)

是的,这可以通过使用CONNECT BY来实现。但有点复杂:

SELECT xchar, xcount FROM (
    SELECT xchar, COUNT(*) AS xcount, RANK() OVER ( ORDER BY COUNT(*) DESC) AS rn
      FROM (
        SELECT SUBSTR('aaaabcd', LEVEL, 1) AS xchar
          FROM dual
       CONNECT BY LEVEL <= LENGTH('aaaabcd')
   ) GROUP BY xchar
) WHERE rn = 1;

我们在最里面的查询中做的是将字符串分解为单个字符。然后我们只得到按字符分组的COUNT(),并使用RANK()来查找最大值(请注意,如果最常出现的字符存在平局,则会返回多个结果)。< / p>

上述查询返回最常出现的字符及其出现的次数。

如果您有一个包含多个字符串的表,那么您将需要执行以下操作:

WITH strlen AS (
  SELECT LEVEL AS strind
    FROM dual
 CONNECT BY LEVEL <= 30
)
SELECT id, xchar, xcount FROM (
    SELECT id, xchar, COUNT(*) AS xcount, RANK() OVER ( PARTITION BY id ORDER BY COUNT(*) DESC) AS rn
      FROM (
        SELECT s.id, SUBSTR(s.str, sl.strind, 1) AS xchar
          FROM strings s, strlen sl
         WHERE LENGTH(s.str) >= sl.strind
   ) GROUP BY id, xchar
) WHERE rn = 1;

其中30是一个幻数,等于最长字符串的长度或更大。 See SQL Fiddle here.或者,您可以执行以下操作以避免幻数:

WITH strlen AS (
  SELECT LEVEL AS strind
    FROM dual
 CONNECT BY LEVEL <= ( SELECT MAX(LENGTH(str)) FROM strings )
)
SELECT id, xchar, xcount FROM (
    SELECT id, xchar, COUNT(*) AS xcount, RANK() OVER ( PARTITION BY id ORDER BY COUNT(*) DESC) AS rn
      FROM (
        SELECT s.id, SUBSTR(s.str, sl.strind, 1) AS xchar
          FROM strings s, strlen sl
         WHERE LENGTH(s.str) >= sl.strind
   ) GROUP BY id, xchar
) WHERE rn = 1;

Updated SQL Fiddle.

答案 1 :(得分:2)

这是一种方式 - 假设您要显示每个字符串中字符数最多的所有行:

with sample_data as (select 'aaaabcd' str from dual union all
                     select '0001001' str from dual union all
                     select '11002' str from dual),
         pivoted as (select str, substr(str, level, 1) letter
                     from   sample_data
                     connect by level <= length(str)
                                and prior str = str
                                and prior dbms_random.value is not null),
             grp as (select str, letter, count(*) cnt
                     from   pivoted
                     group by str, letter),
          ranked as (select str,
                            letter,
                            dense_rank() over (partition by str order by cnt desc) dr
                     from   grp)
select str, letter
from   ranked
where  dr = 1;

STR     LETTER
------- ------
0001001 0     
11002   1     
11002   0     
aaaabcd a     

如果您希望仅在出现平局时显示其中一个字母,请更改上面查询中的dense_rank() row_number

如果您想在一行中显示所有绑定的字母(例如以逗号分隔),请在最终查询中使用listagg将行分组为一个。

答案 2 :(得分:1)

一个选项是在PL / SQL中完成。 - 为什么选择PLSQL?

PLSQL很可能更具可读性,可以在更大的查询中轻松重用,并且可能更高效。如果你希望这个频率在一个符合特定条件的表上的2列,那么SQL-Solution几乎是不可读的,甚至可能搞乱查询计划...该函数也是确定性的,因此将缓存具有相同内容的行。 ..

此外,您可以将此功能用于虚拟列或基于功能的索引。

快速(可能不是非常可靠)基准测试将PLSQL与11g数据库上超过10K行的建议CONNECT BY解决方案进行比较,显示CONNECT BY的运行时间约为40秒,PLSQL的运行时间为2秒。

CREATE OR REPLACE
FUNCTION get_most_freq_char( p_input VARCHAR2 )
RETURN VARCHAR2
IS  
  TYPE t_charcount IS TABLE OF SIMPLE_INTEGER
                      INDEX BY VARCHAR2(1);
  l_map      t_charcount;
  l_value    VARCHAR2(1);
  l_maxchar  VARCHAR2(1);
BEGIN
  FOR i IN 1 .. LENGTH( p_input )
  LOOP
    l_value := SUBSTR( p_input, i ,1 );

    l_map( l_value ) := CASE WHEN l_map.EXISTS( l_value )
                             THEN l_map( l_value ) + 1
                             ELSE 1 END;

    IF l_maxchar IS NULL OR l_map( l_value ) > l_map( l_maxchar )
    THEN
      l_maxchar := l_value;
    END IF;

  END LOOP;

  RETURN l_maxchar;
END;
/

SELECT get_most_freq_char( 'abcdeffffffbbbaaaaaa' ) FROM DUAL;

答案 3 :(得分:0)

除了所有伟大的答案。假设你有类似这样的表:

FULL_STRING
-----------
0001230
aaaabcd
bbbbcdef


SELECT * FROM
(
 SELECT full_str
      , str max_char_in_string
      , ROW_NUMBER() OVER (PARTITION BY full_str ORDER BY full_str) rno
   FROM
   (
    SELECT distinct full_str, SUBSTR(full_str, LEVEL, 1) AS str
      FROM drop_tab
   CONNECT BY LEVEL <= LENGTH(full_str)
   ORDER BY 1
   )
 ORDER BY 3, 1
 )
WHERE rno = 1
/

FULL_STRING MAX_CHAR RNO
-------------------------
0001230      0       1
aaaabcd      a       1
bbbbcdef     b       1