Question

我有一些看起来像这样的字符串：

'george hughes and steve jones'

我想将其分成两个字符串。我这样做的方式是：

select regexp_split_to_table('george hughes and steve jones and dennis lowe',' and ') into names;

哪个返回george hughes和steve jones和dennis lowe。

但是，我也有一些看起来像这样的字符串：

'john and mark jackson'

这两个人的姓氏相同，但是使用上述功能将返回john和mark jackson，而不是john jackson和mark jackson

有什么方法可以将某些逻辑应用于regexp函数，如果拆分词（在这种情况下，'和'只有一个词，那么可以使用某些不同的功能？

这将使第一个示例仍然有效，但是第二个示例将拆分为john jackson和mark jackson，因为我将使用另一个函数，该函数可能会在完整字符串中添加最后一个单词（{ {1}}）到仅包含一个单词的细分部分（jackson）。

Answer 1

您可以尝试以下操作，该操作可以填写所有遗漏的姓氏：

print(df)

   col1_a  col1_b  col2_a  col2_b  col1_c  col2_c
0       1       6     0.1     0.6     7.0     0.7
1       2       7     0.2     0.7     9.0     0.9
2       3       8     0.3     0.8    11.0     1.1
3       4       9     0.4     0.9    13.0     1.3
4       5       5     0.5     0.5    10.0     1.0

这将返回：

SELECT regexp_replace(
  'tim price and neil and adam sutcliffe and clive johnson and john and mark jackson',
  '(?<=^| and )(\w+?) and (\w+?) (?!and )(\w+?)(?=$| )',
  '\1 \3 and \2 \3',
  'g'
);

这将查找一个单词，后跟“和”，然后是另一个单词，然后是非“ and”单词，然后是文本结尾或其他空格，然后从第二个开始添加姓氏名字后的名字。 “ g”是一个全局标志，表示它将在第一次替换后继续执行。

然后您可以按原始方法进行拆分。

Answer 2

复杂正则表达式的替代方法：

select    name.f
       || ' '
       -- If no last name, use the next one in the list
       || coalesce(nullif(name.l,''),lead(name.l) over ())
          as full_name
from regexp_split_to_table('tim price and neil and adam sutcliffe and clive johnson and john and mark jackson',' and ') list(name)

     -- Find the position of the space separating first and last name.  If no last name, set to one char past first name
     join lateral (select coalesce(nullif(position(' ' in list.name),0),char_length(list.name)+1)) delim(pos) on true

     -- Return first and last names separately
     join lateral (
    select left(list.name,delim.pos-1)
          ,overlay(list.name placing '' from 1 for delim.pos)
     ) name(f,l) on true
;

返回：

   full_name    
----------------
 tim price
 neil sutcliffe
 adam sutcliffe
 clive johnson
 john jackson
 mark jackson
(6 rows)

根据psql中的某些逻辑拆分字符串

2 个答案: