SQL - 从表中查找列中最完整的字符串

时间:2016-05-24 18:14:52

标签: sql postgresql

每次用户在网站上搜索文本时,搜索文本都会记录到search_table。还记录了子搜索。它们用星号记录。

目标是找到用户搜索的最完整的搜索文本。

理想的方式是:

        Group the ids = 1,4,6 and obtain id=6
        Group the ids = 2,5,7 and obtain id = 7
        Group the ids = 3 and obtain id = 3
        Group the ids 8, 9 and obtain id = 9

SEARCH_TABLE

            id user   search_text
            --------------------
            1  user1  data manag*
            2  user1  confer*
            3  user1  incomplete sear*
            4  user1  data managem*
            5  user1  conference c*
            6  user1  data management
            7  user1  conference call
            8 user1  status in*
            9 user1  status information

输出应为

        user  search_text
        ---------------------
        user1 data management
        user1 conference call
        user1 incomplete sear*
        user1 status information

你能帮帮忙吗?

3 个答案:

答案 0 :(得分:0)

下面的内容应该可以完成这项工作:

SELECT * FROM 
    SEARCH_TABLE st 
    WHERE 
    NOT EXISTS (

    SELECT 1 FROM 
        SEARCH_TABLE st2 
        -- remove asterkis and ad %
        WHERE  st2.search_Text LIKE replace(st.search_text,'*','')||'%'
    )

这会过滤所有属于他人的搜索。

答案 1 :(得分:0)

这可能不是最优雅的方式,但这里有一个方法:

   alter table your_table
   add group_id int 

   select [user], left(search_text, 5) as Group_Text, IDENTITY(int, 1,1) as Group_ID
   into #group_id_table
   from your_table
   group by [user], left(search_text, 5)
   order by [user], left(search_text, 5)

   update a
   set a.group_id = b.group_id
   from your_table as a
   join #group_id_table as b
   on left(search_text, 5) = group_text

   select [user], max(search_text), group_id
   from your_table
   group by [user], group_id
   order by [user], group_id

当我运行它时,这实现了预期的结果,但当然因为你将group_id基于用户指定的字符串长度,可能存在问题。我希望这能帮到你。

答案 2 :(得分:0)

给它一个机会。我将完成的文本(及其较短的部分)分开,然后找到每条记录的最长部分。在Oracle中测试,因为我现在无法访问PostgreSQL,但我没有使用任何异国情调,所以它应该可以工作。

with 
  --Contains all completed searches
  COMPLETE   as (select * from SEARCH_TABLE where SEARCH_TEXT not like '%*'),
  --Contains all searches that are incomplete and dont have a completed match
  INCOMPLETE as (
    select S.* 
    from SEARCH_TABLE S 
    left join COMPLETE C 
      on  S.USR = C.USR
      and C.SEARCH_TEXT like replace(S.SEARCH_TEXT, '*', '%')
    where C.ID is null
  ),
  --chains all incompleted with any matching pattern shorter than it.
  CHAINED_INC as (
    select LONGER.USR, LONGER.ID, LONGER.SEARCH_TEXT, SHORTER.SEARCH_TEXT SEARCH_TEXT_SHORT
    from INCOMPLETE LONGER 
    join INCOMPLETE SHORTER
      on  LONGER.SEARCH_TEXT like replace(SHORTER.SEARCH_TEXT, '*', '%')
      and LONGER.ID <> SHORTER.ID
  )
--if a text is not the shorter text for a different record, that means it's the longest text for that pattern.
select distinct T1.USR, T1.SEARCH_TEXT  
from CHAINED_INC T1 
left join CHAINED_INC T2
  on  T1.USR = T2.USR
  and T1.SEARCH_TEXT = T2.SEARCH_TEXT_SHORT
where T2.SEARCH_TEXT_SHORT is null
--finally, union back to the completed texts.
union all
select USR, SEARCH_TEXT from COMPLETE
;

修改:从选择

中删除了ID