我有下表:
B_ID I_ID R_ID
W00001 1234 1235,1237
B00001 1235 1236,1235
T00001 1236 1235,1235,1235
X00001 1237 1234,1236,1238
M00001 1238 1238
我需要输出如下使用sql
B_ID I_ID R_ID
W00001 1234 B00001|X00001
B00001 1235 T00001|B00001
T00001 1236 B00001
X00001 1237 W00001|T00001|M00001
M00001 1238 M00001
示例:第1行R_ID的值为1235,1237。 1235和1237存在于I_ID中,因此选择它们相应的B_ID,即B00001,X00001,预期输出为B00001 | X00001
答案 0 :(得分:1)
没有重复且不依赖任何幻数:
Oracle安装程序:
CREATE TABLE test_data ( b_id, i_id, r_id ) as
select 'W00001', 1234, '1235,1237' from dual union all
select 'B00001', 1235, '1236,1235' from dual union all
select 'T00001', 1236, '1235,1235,1235' from dual union all
select 'X00001', 1237, '1234,1236,1238' from dual union all
select 'M00001', 1238, '1238' from dual;
<强>查询强>:
SELECT b_id,
i_id,
( SELECT LISTAGG( t.b_id, '|' ) WITHIN GROUP ( ORDER BY ROWNUM )
FROM TABLE( CAST( MULTISET(
SELECT DISTINCT
TO_NUMBER( REGEXP_SUBSTR( d.r_id, '\d+', 1, LEVEL ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( d.r_id, '\d+' )
) AS SYS.ODCINUMBERLIST ) ) v
INNER JOIN test_data t
ON (v.COLUMN_VALUE = t.i_id) ) AS r_id
FROM test_data d;
<强>解释强>
内部相关选择:
SELECT DISTINCT
TO_NUMBER( REGEXP_SUBSTR( d.r_id, '\d+', 1, LEVEL ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( d.r_id, '\d+' )
获取单行的r_id
,并按逗号分隔的值将其分隔为一行; DISTINCT
子句表示只输出唯一值。
使用TABLE( CAST( MULTISET( ... ) AS collection_type ) )
将其转换为表集合表达式,以便它可以连接到另一个表。
然后将其自我加入回test_data
以从显示i_id
转换为b_id
s,LISTAGG()
用于将多行重新聚合回B_ID I_ID R_ID
------ ---------- --------------------
W00001 1234 B00001|X00001
B00001 1235 T00001|B00001
T00001 1236 B00001
X00001 1237 W00001|T00001|M00001
M00001 1238 M00001
一行。
<强>输出强>:
CREATE OR REPLACE TYPE numberlist IS TABLE OF NUMBER;
/
CREATE OR REPLACE FUNCTION split_Number_List(
i_str IN VARCHAR2,
i_delim IN VARCHAR2 DEFAULT ','
) RETURN numberlist DETERMINISTIC
AS
p_result numberlist := numberlist();
p_start NUMBER(5) := 1;
p_end NUMBER(5);
c_len CONSTANT NUMBER(5) := LENGTH( i_str );
c_ld CONSTANT NUMBER(5) := LENGTH( i_delim );
BEGIN
IF c_len > 0 THEN
p_end := INSTR( i_str, i_delim, p_start );
WHILE p_end > 0 LOOP
p_result.EXTEND;
p_result( p_result.COUNT ) := TO_NUMBER( SUBSTR( i_str, p_start, p_end - p_start ) );
p_start := p_end + c_ld;
p_end := INSTR( i_str, i_delim, p_start );
END LOOP;
IF p_start <= c_len + 1 THEN
p_result.EXTEND;
p_result( p_result.COUNT ) := TO_NUMBER( SUBSTR( i_str, p_start, c_len - p_start + 1 ) );
END IF;
END IF;
RETURN p_result;
END;
/
Oracle安装程序:
SELECT b_id,
i_id,
( SELECT LISTAGG( t.b_id, '|' ) WITHIN GROUP ( ORDER BY ROWNUM )
FROM TABLE( SET( split_Number_List( d.r_id ) ) ) v
INNER JOIN test_data t
ON (v.COLUMN_VALUE = t.i_id) ) AS r_id
FROM test_data d;
<强>查询强>:
SELECT b_id,
i_id,
( SELECT LISTAGG( t.b_id, '|' ) WITHIN GROUP ( ORDER BY ROWNUM )
FROM test_data t
WHERE ',' || d.r_id || ',' LIKE '%,' || t.i_id || ',%' ) AS r_id
FROM test_data d;
(与上面相同的输出)
选项3 :
',' || r_id || ','
您可以使用基于'%,' || i_id || ',%'
和{{1}}的函数来提高此选项的效果。
(与上面相同的输出)
答案 1 :(得分:0)
在下面的解决方案中,我使用标准技术将每个逗号分隔的字符串拆分为因子子查询prep
中的组件(标记)。然后我加入到原始表中,用相应的i_id
替换每个标记(这是一个b_id
),然后将标记重新组合成一个带有listagg()
的管道分隔的字符串。
注意:此解决方案假设每个r_id
少于100个令牌(请参阅idx
定义中的“幻数”100)。如果已知每个r_id
将具有不超过9个令牌,则可以将100更改为10(导致更快的处理)。如果事先知道没有上限,你可以将100改为一些可笑的大数;如果r_id
不是CLOB,则会执行4000,因为VARCHAR2等等限制为4000个字符。
感谢MT0提醒我添加此笔记。
with test_data ( b_id, i_id, r_id ) as (
select 'W00001', 1234, '1235,1237' from dual union all
select 'B00001', 1235, '1236,1235' from dual union all
select 'T00001', 1236, '1235,1235,1235' from dual union all
select 'X00001', 1237, '1234,1236,1238' from dual union all
select 'M00001', 1238, '1238' from dual
),
idx ( n ) as (
select level from dual connect by level < 100
),
prep ( b_id, i_id, n, token ) as (
select t.b_id, t.i_id, i.n,
regexp_substr(t.r_id, '([^,]+)', 1, i.n, null, 1)
from test_data t join idx i
on i.n <= regexp_count(t.r_id, ',') + 1
)
select p.b_id, p.i_id,
listagg(t.b_id, '|') within group (order by p.n) as r_id
from prep p join test_data t
on p.token = t.i_id
group by p.b_id, p.i_id
order by p.i_id;
B_ID I_ID R_ID
------ ---------- ------------------------------
W00001 1234 B00001|X00001
B00001 1235 T00001|B00001
T00001 1236 B00001|B00001|B00001
X00001 1237 W00001|T00001|M00001
M00001 1238 M00001
添加信息基于与MT0的进一步对话。
我基于与MT0的更多对话再次编辑了这个“添加的信息”。谢谢MT0让我保持警惕!
在下面的解决方案中,我取消了幻数100,而是使用常用技术来处理多个输入行和connect by level
。我还展示了处理重复项的常用技术(在以逗号分隔的输入字符串获得的标记中)。
<强>查询强>:
with
test_data ( b_id, i_id, r_id ) as (
select 'W00001', 1234, '1235,1237' from dual union all
select 'B00001', 1235, '1236,1235' from dual union all
select 'T00001', 1236, '1235,1235,1235' from dual union all
select 'X00001', 1237, '1234,1236,1238' from dual union all
select 'M00001', 1238, '1238' from dual
),
prep ( b_id, i_id, n, token ) as (
select b_id, i_id, level,
regexp_substr(r_id, '([^,]+)', 1, level, null, 1)
from test_data t
connect by level <= regexp_count(r_id, ',') + 1
and prior r_id = r_id -- to only generate the rows needed
and prior sys_guid() is not null -- this is unique, to avoid cycles
),
z ( b_id, i_id, n, token, rn ) as (
select b_id, i_id, n, token,
row_number() over (partition by i_id, token order by n)
from prep
)
select z.b_id, z.i_id,
listagg(t.b_id, '|') within group (order by z.n) as r_id
from z join test_data t
on z.token = t.i_id
where z.rn = 1
group by z.b_id, z.i_id
order by i_id;
<强>结果强>:
B_ID I_ID R_ID
------ ---------- ------------------------------
W00001 1234 B00001|X00001
B00001 1235 T00001|B00001
T00001 1236 B00001
X00001 1237 W00001|T00001|M00001
M00001 1238 M00001
5 rows selected.
答案 2 :(得分:0)
决定添加另一个答案,因为它使用完全不同的方法 - 递归子查询因子,自Oracle版本11.2起可用。
我做了一些测试,输入(持久)表名为test_data
,有9000行,每个r_id是逗号分隔的200个令牌的字符串;结构非常类似于原始帖子中OP的小样本。我尝试了三种方法:分层查询(使用connect by
和prior sys_guid()
技巧),我提出了这个方法;基于相关子查询和嵌套表的解决方案,由MT0发布;以及我将在下面显示的递归查询。在每种情况下,我都使用查询作为CTAS语句的select...
部分。
(为了比较苹果和苹果,我修改了MT0的查询,删除了r_id中“令牌”是数字的额外信息 - 我将它们视为字符串,与其他两种方法一样。)
递归查询:
with
prep ( b_id, i_id, str, n, st_pos, end_pos, token) as (
select b_id, i_id, ',' || r_id || ',', -1, null, 1, null
from test_data
union all
select b_id, i_id, str, n+1, end_pos + 1, instr(str, ',', 1, n+3),
substr(str, st_pos, end_pos - st_pos)
from prep
where end_pos != 0
),
z ( b_id, i_id, n, token, rn ) as (
select b_id, i_id, n, token,
row_number() over (partition by i_id, token order by n)
from prep
)
select z.b_id, z.i_id,
listagg(t.b_id, '|') within group (order by z.n) as r_id
from z join test_data t
on z.token = t.i_id
where z.rn = 1
group by z.b_id, z.i_id
;
实际上,人们可以挤出一点额外的性能;在递归CTE的锚点部分(union all
定义中prep
的第一个成员),我可以从n = 0
,st_pos = 1
和{{1}开始第一个逗号的位置(实际上是更改后的字符串中逗号的第二个;我发现在输入CSV字符串的开头和结尾添加逗号更容易,并像我一样编写递归CTE。)但是,对于每个字符串,这只能节省200次迭代;这样可以节省0.5%的执行时间。我发现我编写递归CTE的方式更容易理解。
为了完整性,这里是我使用的“嵌套表”方法的修改版本(credit @ MT0):
end_pos =