Question

我有一个带有字符串的表，其中包含几个分隔值，例如： a;b;c。

我需要拆分此字符串并在查询中使用其值。例如，我有以下表格：

str
a;b;c
b;c;d
a;c;d

我需要按str列中的单个值进行分组，以获得以下结果：

str count(*)
a   1
b   2
c   3
d   2

是否可以使用单选查询实现？我无法创建临时表来提取那里的值并查询该临时表。

Answer 1

从评论到@PrzemyslawKruglej answer

主要问题是使用connect by的内部查询，它会生成惊人数量的行

可以使用以下方法减少生成的行数：

/* test table populated with sample data from your question */
SQL> create table t1(str) as(
  2    select 'a;b;c'  from dual union all
  3    select 'b;c;d'  from dual union all
  4    select 'a;c;d'  from dual
  5  );
Table created

--  number of rows generated will solely depend on the most longest 
--  string. 
--  If (say) the longest string contains 3 words (wont count separator `;`)
--  and we have 100 rows in our table, then we will end up with 300 rows 
--  for further processing , no more.
with occurrence(ocr) as( 
  select level 
    from ( select max(regexp_count(str, '[^;]+')) as mx_t
             from t1 ) t
    connect by level <= mx_t 
)
select count(regexp_substr(t1.str, '[^;]+', 1, o.ocr)) as generated_for_3_rows
  from t1
 cross join occurrence o;

结果：对于最长的一行由三个单词组成的三行，我们将生成9行：

GENERATED_FOR_3_ROWS
--------------------
                  9

最终查询：

with occurrence(ocr) as( 
  select level 
    from ( select max(regexp_count(str, '[^;]+')) as mx_t
             from t1 ) t
    connect by level <= mx_t 
)
select res
     , count(res) as cnt
  from (select regexp_substr(t1.str, '[^;]+', 1, o.ocr) as res
          from t1
         cross join occurrence o)
 where res is not null
 group by res
 order by res;

结果：

RES          CNT
----- ----------
a              2
b              2
c              3
d              2

SQLFIddle Demo

详细了解regexp_count()（11g及以上）和regexp_substr()正则表达式函数。

注意：正则表达式函数的计算成本相对较高，而且在处理大量数据时，可能需要考虑切换到普通的PL / SQL。 Here is an example。

Answer 2

这很难看，但似乎有效。 CONNECT BY拆分的问题是它返回重复的行。我设法摆脱它们，但你必须测试它：

WITH
  data AS (
    SELECT 'a;b;c' AS val FROM dual
    UNION ALL SELECT 'b;c;d' AS val FROM dual
    UNION ALL SELECT 'a;c;d' AS val FROM dual
  )
SELECT token, COUNT(1)
  FROM (
    SELECT DISTINCT token, lvl, val, p_val
      FROM (
        SELECT
            regexp_substr(val, '[^;]+', 1, level) AS token,
            level AS lvl,
            val,
            NVL(prior val, val) p_val
          FROM data
        CONNECT BY regexp_substr(val, '[^;]+', 1, level) IS NOT NULL
      )
    WHERE val = p_val
  )
GROUP BY token;

TOKEN                  COUNT(1)
-------------------- ----------
d                             2 
b                             2 
a                             2 
c                             3

Answer 3

SELECT NAME,COUNT(NAME) FROM ( SELECT NAME  FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL )  NAME
       FROM dual  CONNECT BY REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL) IS NOT NULL))
       UNION ALL  (SELECT NAME  FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL )  NAME
       FROM dual  CONNECT BY REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))
       UNION ALL 
         (SELECT NAME  FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL )  NAME
       FROM dual  CONNECT BY REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))) GROUP BY NAME

NAME  COUNT(NAME)
----- -----------
d               2
a               2
b               2
c               3

将字符串拆分为多行

3 个答案: