我有一个表tbl1
,其中两列col1
和col2
包含字符串:
col1 | col2
--------+--------
bar | foo
foo | foobar
bar1foo | bar2foo
对应的SQL转储:
CREATE TABLE `tbl1` (
`col1` varchar(20) COLLATE latin1_general_ci NOT NULL,
`col2` varchar(20) COLLATE latin1_general_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
INSERT INTO `tbl1` (`col1`, `col2`) VALUES
('bar', 'foo'),
('foo', 'foobar'),
('bar1foo', 'bar2foo');
在大多数情况下,条目的字符串共享一个公共前缀。我需要一个删除那些常用前缀的查询。预期结果:
bar | foo
| bar
1foo | 2foo
到目前为止,我的方法:
SELECT
SUBSTR(`col1`, 1+GREATEST(LENGTH(`col1`), LENGTH(`col2`)) - CEIL(LENGTH(TRIM(TRAILING '0' FROM HEX(ABS(CONV(HEX(REVERSE(`col1`)),16,10) - CONV(HEX(REVERSE(`col2`)),16,10)))))/2)),
SUBSTR(`col2`, 1+GREATEST(LENGTH(`col1`), LENGTH(`col2`)) - CEIL(LENGTH(TRIM(TRAILING '0' FROM HEX(ABS(CONV(HEX(REVERSE(`col1`)),16,10) - CONV(HEX(REVERSE(`col2`)),16,10)))))/2))
FROM tbl1
简短说明:将字符串反转(REVERSE
),将其转换为整数(HEX
和CONV
),并相减(-
和{{1} }),转换为十六进制表示形式(ABS
,从结尾(HEX
开始修剪0
,从最长字符串的长度中减去此结果的长度( TRIM
,-
和LENGTH
),然后由GREATEST
用于获取结果。
我的方法存在问题:
答案 0 :(得分:1)
此代码有效,尽管冗长又丑陋和(也许)性能不佳:
select
substring(t.col1, g.maxlen + 1) col1,
substring(t.col2, g.maxlen + 1) col2
from tbl1 t inner join (
select t.col1, t.col2,
max(case when left(col1, tt.n) = left(col2, tt.n) then tt.n else 0 end) maxlen
from tbl1 t inner join (
select 1 n union all select 2 union all select 3 union all select 4 union all
select 5 union all select 6 union all select 7 union all select 8 union all
select 9 union all select 10 union all select 11 union all select 12 union all
select 13 union all select 14 union all select 15 union all select 16 union all
select 17 union all select 18 union all select 19 union all select 20
) tt on least(length(t.col1), length(t.col2)) >= tt.n
group by t.col1, t.col2
) g on g.col1 = t.col1 and g.col2 = t.col2
请参见demo。
对于 MySql 8.0 + ,您可以使用recursive CTE
,在这种情况下,无需事先了解列的长度:
with
recursive lengths as (
select 1 n
union all
select n + 1
from lengths
where n < (select max(least(length(col1), length(col2))) from tbl1)
),
cte as (
select t.col1, t.col2,
max(case when left(col1, l.n) = left(col2, l.n) then l.n else 0 end) maxlen
from tbl1 t inner join lengths l
on least(length(t.col1), length(t.col2)) >= l.n
group by t.col1, t.col2
)
select
substring(t.col1, c.maxlen + 1) col1,
substring(t.col2, c.maxlen + 1) col2
from tbl1 t inner join cte c
on c.col1 = t.col1 and c.col2 = t.col2
请参见demo。
结果:
| col1 | col2 |
| ---- | ---- |
| | bar |
| bar | foo |
| 1foo | 2foo |
答案 1 :(得分:1)
可悲的是,最通用的性能方法可能是一个巨大的case
表达式。但是,这只能在一定长度下起作用:
select substr(col1, prefix_length + 1),
substr(col2, prefix_length + 1)
from (select tbl1.*,
(case when left(col1, 10) = left(col2, 10) then 10
when left(col1, 9) = left(col2, 9) then 9
. . .
else 0
end) as prefix_length
from tbl1
) t;
实际上,您可以使用递归CTE来做到这一点,这是最通用的方法:
with recursive cte as (
select col1, col2, 1 as lev, col1 as orig_col1, col2 as orig_col2
from tbl1
union all
select substr(col1, 2), substr(col2, 2), lev + 1, orig_col1, orig_col2
from cte
where left(col1, 1) = left(col2, 1)
)
select col1, col2
from (select cte.*,
dense_rank() over (partition by orig_col1, orig_col2 order by lev desc) as seqnum
from cte
) x
where seqnum = 1;
尽管性能肯定会比解决方案或庞大的case
表达式差,但它可能还不错,您可能会发现它足以满足您的目的。
Here是db <>两种解决方案的提琴。