我知道这已经在某种程度上得到了PHP和MYSQL的回答,但我想知道是否有人可以教我最简单的方法将字符串(逗号分隔)拆分为Oracle 10g(最好)和11g中的多行。
表格如下:
Name | Project | Error
108 test Err1, Err2, Err3
109 test2 Err1
我想创建以下内容:
Name | Project | Error
108 Test Err1
108 Test Err2
108 Test Err3
109 Test2 Err1
我已经看到了一些围绕堆栈的潜在解决方案,但是它们只占了一个列(以逗号分隔的字符串)。任何帮助将不胜感激。
答案 0 :(得分:101)
使用大型数据集时,接受的答案性能较差。
这可能是一种改进的方式(也可以使用正则表达式和连接方式):
with temp as
(
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual
union all
select 109, 'test2', 'Err1' from dual
)
select distinct
t.name, t.project,
trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) as error
from
temp t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels
order by name
修改强>: 以下是查询的简单说明(如“非深入”)。
length (regexp_replace(t.error, '[^,]+')) + 1
使用regexp_replace
来删除任何不是分隔符(本例中为逗号)和length +1
的内容,以获取有多少元素(错误)。 select level from dual connect by level <= (...)
使用分层查询创建一个匹配数越来越多的列,从1到错误总数。
预览:
select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1 as max
from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1
table(cast(multiset(.....) as sys.OdciNumberList))
做了一些oracle类型的转换。
cast(multiset(.....)) as sys.OdciNumberList
将多个集合(原始数据集中每行的一个集合)转换为单个数字集合OdciNumberList。table()
函数将集合转换为结果集。 FROM
会在数据集和多集之间创建交叉联接。
因此,数据集中包含4个匹配项的行将重复4次(名为“column_value”的列中的数字越来越多)。
预览:
select * from
temp t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels
trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))
使用column_value
作为regexp_substr
的 nth_appearance / ocurrence 参数。t.name, t.project
作为示例),以便于显示。对Oracle文档的一些引用:
答案 1 :(得分:28)
正则表达式很棒:)
with temp as (
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual
union all
select 109, 'test2', 'Err1' from dual
)
SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str
FROM (SELECT Name, Project, Error str FROM temp) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by Name
答案 2 :(得分:28)
以下两者之间存在巨大差异:
如果不限制行,则 CONNECT BY 子句会产生多行,并且不会提供所需的输出。
除了正则表达式之外,还有一些其他选择正在使用:
<强>设置强>
SQL> CREATE TABLE t (
2 ID NUMBER GENERATED ALWAYS AS IDENTITY,
3 text VARCHAR2(100)
4 );
Table created.
SQL>
SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');
1 row created.
SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');
1 row created.
SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');
1 row created.
SQL> COMMIT;
Commit complete.
SQL>
SQL> SELECT * FROM t;
ID TEXT
---------- ----------------------------------------------
1 word1, word2, word3
2 word4, word5, word6
3 word7, word8, word9
SQL>
使用 XMLTABLE :
SQL> SELECT id,
2 trim(COLUMN_VALUE) text
3 FROM t,
4 xmltable(('"'
5 || REPLACE(text, ',', '","')
6 || '"'))
7 /
ID TEXT
---------- ------------------------
1 word1
1 word2
1 word3
2 word4
2 word5
2 word6
3 word7
3 word8
3 word9
9 rows selected.
SQL>
使用 MODEL 子句:
SQL> WITH
2 model_param AS
3 (
4 SELECT id,
5 text AS orig_str ,
6 ','
7 || text
8 || ',' AS mod_str ,
9 1 AS start_pos ,
10 Length(text) AS end_pos ,
11 (Length(text) - Length(Replace(text, ','))) + 1 AS element_count ,
12 0 AS element_no ,
13 ROWNUM AS rn
14 FROM t )
15 SELECT id,
16 trim(Substr(mod_str, start_pos, end_pos-start_pos)) text
17 FROM (
18 SELECT *
19 FROM model_param MODEL PARTITION BY (id, rn, orig_str, mod_str)
20 DIMENSION BY (element_no)
21 MEASURES (start_pos, end_pos, element_count)
22 RULES ITERATE (2000)
23 UNTIL (ITERATION_NUMBER+1 = element_count[0])
24 ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1,
25 end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) )
26 )
27 WHERE element_no != 0
28 ORDER BY mod_str ,
29 element_no
30 /
ID TEXT
---------- --------------------------------------------------
1 word1
1 word2
1 word3
2 word4
2 word5
2 word6
3 word7
3 word8
3 word9
9 rows selected.
SQL>
答案 3 :(得分:8)
还有几个相同的例子:
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1
/
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1
/
另外,可以使用DBMS_UTILITY.comma_to_table&amp; table_to_comma: http://www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table
答案 4 :(得分:6)
我想提出一种使用PIPELINED表函数的不同方法。它有点类似于XMLTABLE的技术,除了你提供自己的自定义函数来分割字符串:
-- Create a collection type to hold the results
CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30);
/
-- Split the string according to the specified delimiter
CREATE OR REPLACE FUNCTION str2tbl (
p_string VARCHAR2,
p_delimiter CHAR DEFAULT ','
)
RETURN typ_str2tbl_nst PIPELINED
AS
l_tmp VARCHAR2(32000) := p_string || p_delimiter;
l_pos NUMBER;
BEGIN
LOOP
l_pos := INSTR( l_tmp, p_delimiter );
EXIT WHEN NVL( l_pos, 0 ) = 0;
PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) );
l_tmp := SUBSTR( l_tmp, l_pos+1 );
END LOOP;
END str2tbl;
/
-- The problem solution
SELECT name,
project,
TRIM(COLUMN_VALUE) error
FROM t, TABLE(str2tbl(error));
结果:
NAME PROJECT ERROR
---------- ---------- --------------------
108 test Err1
108 test Err2
108 test Err3
109 test2 Err1
这种方法的问题在于,优化器通常不会知道表函数的基数,因此必须进行猜测。这可能会对您的执行计划产生潜在的危害,因此可以扩展此解决方案以为优化程序提供执行统计信息。
您可以通过在上面的查询上运行EXPLAIN PLAN来查看此优化程序估算值:
Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16336 | 366K| 59 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 16336 | 366K| 59 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 8168 | 16336 | 28 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
即使集合只有3个值,优化器估计它有8168行(默认值)。这一开始看起来似乎无关紧要,但优化者可能已经足够决定一个次优计划。
解决方案是使用优化程序扩展来提供集合的统计信息:
-- Create the optimizer interface to the str2tbl function
CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT (
dummy NUMBER,
STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
RETURN NUMBER,
STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo,
p_stats OUT SYS.ODCITabFuncStats,
p_args IN SYS.ODCIArgDescList,
p_string IN VARCHAR2,
p_delimiter IN CHAR DEFAULT ',' )
RETURN NUMBER
);
/
-- Optimizer interface implementation
CREATE OR REPLACE TYPE BODY typ_str2tbl_stats
AS
STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
RETURN NUMBER
AS
BEGIN
p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') );
RETURN ODCIConst.SUCCESS;
END ODCIGetInterfaces;
-- This function is responsible for returning the cardinality estimate
STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo,
p_stats OUT SYS.ODCITabFuncStats,
p_args IN SYS.ODCIArgDescList,
p_string IN VARCHAR2,
p_delimiter IN CHAR DEFAULT ',' )
RETURN NUMBER
AS
BEGIN
-- I'm using basically half the string lenght as an estimator for its cardinality
p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) );
RETURN ODCIConst.SUCCESS;
END ODCIStatsTableFunction;
END;
/
-- Associate our optimizer extension with the PIPELINED function
ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats;
测试生成的执行计划:
Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 23 | 59 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 23 | 59 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 1 | 2 | 28 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
正如您所看到的,上面计划中的基数不再是8196猜测值了。它仍然不正确,因为我们正在将一列而不是字符串文字传递给该函数。
在这种特殊情况下,有必要对函数代码进行一些调整以进行更接近的估计,但我认为整体概念在这里有很多解释。
本答案中使用的str2tbl函数最初由Tom Kyte开发: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:110612348061
通过阅读本文,可以进一步探索将统计信息与对象类型相关联的概念: http://www.oracle-developer.net/display.php?id=427
此处描述的技术适用于10g +。
答案 5 :(得分:4)
在Oracle 11i之前未添加REGEXP_COUNT。这是从Art的解决方案中采用的Oracle 10g解决方案。
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <=
LENGTH('Err1, Err2, Err3')
- LENGTH(REPLACE('Err1, Err2, Err3', ',', ''))
+ 1;
答案 6 :(得分:4)
我认为我连接的最佳方式和regexp功能
with temp as (
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual
union all
select 109, 'test2', 'Err1' from dual
)
SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str
FROM (SELECT Name, Project, Error str FROM temp) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by Name
答案 7 :(得分:2)
不使用通过连接或正则表达式:
with mytable as (
select 108 name, 'test' project, 'Err1,Err2,Err3' error from dual
union all
select 109, 'test2', 'Err1' from dual
)
,x as (
select name
,project
,','||error||',' error
from mytable
)
,iter as (SELECT rownum AS pos
FROM all_objects
)
select x.name,x.project
,SUBSTR(x.error
,INSTR(x.error, ',', 1, iter.pos) + 1
,INSTR(x.error, ',', 1, iter.pos + 1)-INSTR(x.error, ',', 1, iter.pos)-1
) error
from x, iter
where iter.pos < = (LENGTH(x.error) - LENGTH(REPLACE(x.error, ','))) - 1;
答案 8 :(得分:2)
从Oracle 12c开始,您可以使用JSON_TABLE
和JSON_ARRAY
:
CREATE TABLE tab(Name, Project, Error) AS
SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION
SELECT 109,'test2','Err1' FROM dual;
并查询:
SELECT *
FROM tab t
OUTER APPLY (SELECT TRIM(p) AS p
FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'),
'$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s;
输出:
┌──────┬─────────┬──────────────────┬──────┐
│ Name │ Project │ Error │ P │
├──────┼─────────┼──────────────────┼──────┤
│ 108 │ test │ Err1, Err2, Err3 │ Err1 │
│ 108 │ test │ Err1, Err2, Err3 │ Err2 │
│ 108 │ test │ Err1, Err2, Err3 │ Err3 │
│ 109 │ test2 │ Err1 │ Err1 │
└──────┴─────────┴──────────────────┴──────┘
<强> db<>fiddle demo 强>
答案 9 :(得分:1)
我想添加另一种方法。这个使用递归查询,这是我在其他答案中没有看到的。自11gR2以来,Oracle一直支持它。
with cte0 as (
select phone_number x
from hr.employees
), cte1(xstr,xrest,xremoved) as (
select x, x, null
from cte0
union all
select xstr,
case when instr(xrest,'.') = 0 then null else substr(xrest,instr(xrest,'.')+1) end,
case when instr(xrest,'.') = 0 then xrest else substr(xrest,1,instr(xrest,'.') - 1) end
from cte1
where xrest is not null
)
select xstr, xremoved from cte1
where xremoved is not null
order by xstr
分裂角色非常灵活。只需在INSTR
来电中更改它。
答案 10 :(得分:1)
以下是使用XMLTABLE的替代实现,允许转换为不同的数据类型:
select
xmltab.txt
from xmltable(
'for $text in tokenize("a,b,c", ",") return $text'
columns
txt varchar2(4000) path '.'
) xmltab
;
...或者如果您的分隔字符串存储在表的一行或多行中:
select
xmltab.txt
from (
select 'a;b;c' inpt from dual union all
select 'd;e;f' from dual
) base
inner join xmltable(
'for $text in tokenize($input, ";") return $text'
passing base.inpt as "input"
columns
txt varchar2(4000) path '.'
) xmltab
on 1=1
;
答案 11 :(得分:1)
我遇到了同样的问题,xmltable帮助了我:
SELECT id,trim(COLUMN_VALUE)文本 FROM t,xmltable((''''|| REPLACE(text,',','“,”')||'“'))
答案 12 :(得分:0)
在Oracle 11g和更高版本中,您可以使用递归子查询和简单的字符串函数(这可能比正则表达式和相关的分层子查询快):
Oracle设置:
CREATE TABLE table_name ( name, project, error ) as
select 108, 'test', 'Err1, Err2, Err3' from dual union all
select 109, 'test2', 'Err1' from dual;
查询:
WITH table_name_error_bounds ( name, project, error, start_pos, end_pos ) AS (
SELECT name,
project,
error,
1,
INSTR( error, ', ', 1 )
FROM table_name
UNION ALL
SELECT name,
project,
error,
end_pos + 2,
INSTR( error, ', ', end_pos + 2 )
FROM table_name_error_bounds
WHERE end_pos > 0
)
SELECT name,
project,
CASE end_pos
WHEN 0
THEN SUBSTR( error, start_pos )
ELSE SUBSTR( error, start_pos, end_pos - start_pos )
END AS error
FROM table_name_error_bounds
输出:
NAME | PROJECT | ERROR ---: | :------ | :---- 108 | test | Err1 109 | test2 | Err1 108 | test | Err2 108 | test | Err3
db <>提琴here
答案 13 :(得分:0)
如果安装了Oracle APEX 5.1或更高版本,则可以使用便捷的APEX_STRING.split
功能,例如:
select q.Name, q.Project, s.column_value as Error
from mytable q,
APEX_STRING.split(q.Error, ',') s
第二个参数是定界符字符串。它还接受第三个参数来限制您希望执行的拆分次数。
答案 14 :(得分:-1)
我曾经使用过DBMS_UTILITY.comma_to _table函数 代码如下
declare
l_tablen BINARY_INTEGER;
l_tab DBMS_UTILITY.uncl_array;
cursor cur is select * from qwer;
rec cur%rowtype;
begin
open cur;
loop
fetch cur into rec;
exit when cur%notfound;
DBMS_UTILITY.comma_to_table (
list => rec.val,
tablen => l_tablen,
tab => l_tab);
FOR i IN 1 .. l_tablen LOOP
DBMS_OUTPUT.put_line(i || ' : ' || l_tab(i));
END LOOP;
end loop;
close cur;
end;
我使用了自己的表名和列名