找到两列中匹配字符串最大的行

时间:2012-11-01 13:06:51

标签: mysql sql

说我有这个:

page_url                      | canvas_url
---------------------------------------------------------------
http://www.google.com/        | http://www.google.com/barfoobaz
http://www.google.com/foo/bar | http://www.google.com/foo

我想找到最长匹配所排序的字符串开头的行。我面临的问题是找到最长的匹配字符串,而不仅仅是匹配的行也有匹配的字符串。即。

http://www.google.com/foo匹配第1行中的page_url和第2行中的canvas_url,但如果它是两列的长度而不是匹配,则认为第1行与{{匹配得更好第1行中的1}}更长。

我可以抓住所有匹配,然后在代码中过滤长度,例如:

canvas_url

或执行2个子选择,抓取SELECT *, LENGTH(canvas_url), LENGTH(page_url) FROM app WHERE 'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%') OR 'http://www.google.com/foo' LIKE CONCAT(page_url, '%') 各自canvas_url的热门匹配,然后在代码中将其过滤为1,但我更愿意(除非出现任何荒谬的性能问题)让数据库返回我的内容需要。

我最关心的是MySQL,但我需要针对SQLite和Postgress,所以我对其中任何一个的答案感到满意。

建议?

3 个答案:

答案 0 :(得分:3)

这将有助于获得最长的实际匹配长度(不仅仅是记录中最长的网址):

-- Get page_url matches
SELECT *, LENGTH(page_url) AS MatchLen
FROM app 
WHERE 'http://www.google.com/foo' LIKE CONCAT(page_url, '%') -- can't tell from question if this should be reversed
UNION ALL
-- Get canvas_url matches
SELECT *, LENGTH(canvas_url) AS MatchLen
FROM app 
WHERE 'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%')
-- Bring the longest matches to the top
ORDER BY MatchLen DESC -- May need to add a tie-breaker here
LIMIT 1

这是running example on SqlFiddle

答案 1 :(得分:1)

也许你只需要这样的东西?

SELECT page_url as url, LENGTH(page_url) as len
FROM pages WHERE 'http://www.google.com/foo' LIKE CONCAT(page_url, '%')
UNION
SELECT canvas_url as url, LENGTH(canvas_url) as len
FROM pages WHERE 'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%')
ORDER BY len DESC
LIMIT 1

答案 2 :(得分:0)

如果您只需查找第一行,则需要按顺序排列。你必须对如何安排它有点聪明:

SELECT *, LENGTH(canvas_url), LENGTH(page_url)
FROM app 
WHERE canvas_url like concat('http://www.google.com/foo' '%') OR
      page_url like concat('http://www.google.com/foo', '%')
order by (case when canvas_url like concat('http://www.google.com/foo' '%') and
                    page_url like concat('http://www.google.com/foo', '%') and
                    LENGTH(canvas_url) < LENGTH(page_url)
               then LENGTH(page_url)
               when canvas_url like concat('http://www.google.com/foo' '%') and
                    page_url like concat('http://www.google.com/foo', '%') and
                    LENGTH(canvas_url) >= LENGTH(page_url)
               when canvas_url like concat('http://www.google.com/foo' '%')
               then LENGTH(canvas_url)
               else LENGTH(page_url)
          end)
limit 1

这是按匹配字符串中较长的顺序排序,然后返回恰好一行。请注意,LIMIT不是标准的,因此不同的数据库具有不同的返回一行的机制。