我遇到了问题,我是mysql的初学者,我被分配创建一个从url地址中提取值的查询。所以基本上我有成千上万个这样的网址,例如https://www.google.com/search?source=,我需要提取最后一个' /'直到第一个'(在这种情况下'搜索')。并不是那么简单,有时它的http,有时是' /'字符是不一样的,有时是一个错误的地址(我需要忽略这些情况),有时我有一个案例,我有一个常规的https://google.com/search而没有'?'(这些情况需要也被忽略了。我在这里,但我感到无助,有什么建议吗?
select distinct SUBSTRING_INDEX(col,'/',0) col from table where length(col) - length(replace(col, '/', '')) >= 1
and length(col) - length(replace(col, '?', '')) >= 1
and col = 'value'
and col <> ''
and col is not null
order by date
limit 600;
答案 0 :(得分:0)
首先测试问号。然后,如果存在使用子串到该位置,反转,找到“斜线之前的?”,然后我可以提取该部分的URL一旦我知道/是和?是:
select
url
, substr(url,1,qmarkpos-1)
, case when qmarkpos > 0 then instr(reverse(substr(url,1,qmarkpos-1)),'/') end
, substr(substr(url,1,qmarkpos-1),qmarkpos-instr(reverse(substr(url,1,qmarkpos-1)),'/'),qmarkpos)
from (
select url, instr(url,'?') qmarkpos
from (
select 'https://www.google.com/search?source=' as url union all
select 'https://www.google.com/search' as url union all
select 'https://www.google.com/search?source=uqu iqu iugqidug iqugd' as url union all
select 'https://www.google.com/search?source/qh ohqod hi=' as url
) d
) d2
+---+-------------------------------------------------------------+-------------------------------+--------------+---------+
| | url | substr(url,1,qmarkpos-1) | case ... end | result |
+---+-------------------------------------------------------------+-------------------------------+--------------+---------+
| 1 | https://www.google.com/search?source= | https://www.google.com/search | 7 | /search |
| 2 | https://www.google.com/search | | NULL | |
| 3 | https://www.google.com/search?source=uqu iqu iugqidug iqugd | https://www.google.com/search | 7 | /search |
| 4 | https://www.google.com/search?source/qh ohqod hi= | https://www.google.com/search | 7 | /search |
+---+-------------------------------------------------------------+-------------------------------+--------------+---------+