删除单词中的空格

时间:2013-04-04 10:37:15

标签: sql parsing postgresql text-parsing

我的数据库中有很多字符串(PostgreSQL),例如:

with mystrings as (
    select 'H e l l o, how are you'::varchar string union all
    select 'I am fine, t h a n k you'::varchar string union all
    select 'This is s t r a n g e text'::varchar string union all
    select 'With c r a z y space b e t w e e n characters'::varchar string 
)
select * from mystrings

有没有办法可以删除单词中字符之间的空格?对于我的例子,结果应该是:

Hello, how are you
I am fine, thank you
This is strange text
With crazy space between characters

我从replace开始,但是有很多这样的单词在字符之间有空格,我甚至找不到它们。

因为可能难以有意义地连接字符,所以最好只获得串联候选列表。使用示例数据,结果应为:

H e l l o
t h a n k
s t r a n g e
c r a z y
b e t w e e n

当至少有三个单独的字符用两个空格分隔时,这样的查询应该找到并返回字符串中的所有子字符串(并继续直到patern [space] individual character出现):

He l l o how are you --> llo
H e l l o how are you --> Hello
C r a z y space b e t w e e n --> {crazy, between}

3 个答案:

答案 0 :(得分:1)

根据您的 已编辑的 问题,以下内容会获得所有可能具有least three individual characters separated by two spaces

的候选人
SELECT 
    data || ' --> {' || replace_candidates || '}'
FROM(
SELECT 
    data,
    ( SELECT 
            array_to_string( array_agg( data ),',' )  
        FROM (
            SELECT 
                data,
                length( data ) 
            FROM ( 
                SELECT 
                    replace( data, ' ', '' ) AS data 
                FROM 
                    regexp_split_to_table( data, '\S{2,}' ) AS data 
                ) t
            WHERE length( data ) > 2
        ) t ) AS replace_candidates
    FROM
        mystrings
) T
WHERE 
  replace_candidates IS NOT NULL

<强>工作

首先查看最内部的查询(regexp_split_to_table

  1. regexg获取2 characters in a sequence(空格不是separated)的所有字符串
  2. regexp_split_to_table获得匹配的反转,更多信息here
  3. empty char替换空格并使用records
  4. 过滤length greater than 2

    根据您的要求,铰孔是array aggregate函数来处理formatting,更多here

    <强>结果

    H e l l o how are you --> {Hello}
    I am fine, t h a n k you --> {thank}
    This is s t r a n g e text --> {strange}
    With c r a z y space b e t w e e n characters --> {crazy,between}
    SOME MORE TEST T E X T --> {TEXT}
    

    SQLFIDDLE

    注意:它会将字符视为[space][char][space],但您可以根据[space][space][char][space][space][char][special_char][space]的需要对其进行修改...

    希望这有帮助; p

答案 1 :(得分:0)

如果单词存在,您可以使用在线词典之类的资源,然后您不必删除空格,否则删除空格,或者您可以使用表格,您必须将所有字符串存在,然后您必须检查该表希望你明白我的意思。

答案 2 :(得分:0)

以下查找可能的串联候选项:

 with mystrings as (
    select 'H e l l o, how are you'::varchar string union all
    select 'I am fine, t h a n k you'::varchar string union all
    select 'This is s t r a n g e text'::varchar string union all
    select 'With c r a z y space b e t w e e n characters'::varchar string 
)

, u as (
select string, strpart[rn] as strpart, rn
from  (
   select *, generate_subscripts(strpart, 1) as rn
   from  (
      select string, string_to_array(replace(string,',',''), ' ') as strpart
      from   mystrings
      ) x
   ) y
)

,w as (
select 
   string,strpart,rn, 
   case when length(strpart) = 1 then 1 else 0 end as indchar ,
   case when coalesce(length(lag(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strstart,
   case when coalesce(length(lead(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strend   
from u
) 


,x as (
   select 
      string,rn,strpart,indchar,strstart,
      sum(strstart) over (order by string, rn) as strid 
   from w 
   where indchar = 1 and not (strstart = 1 and strend = 1)
    )

select string, array_to_string(array_agg(strpart),'') as candidate from x group by string, strid