Question

如何解析字符串 - 最后一个分隔符。

在Teradata中，我将名称数据存储在varchar列中。我不知道名字可以有多长，或者它可以有多少件：给定名字，潜在的多个中间名（或没有中间名），姓氏等。

我想解析字符串，假设名称中最后一个空格后的所有内容都是姓氏。任何人都有比我更好的想法吗？

这是我的解决方案：（这是Hack-y，但它有效，并且避免了递归，循环，udfs等。）

drop table tmp;
create volatile table tmp (str1 varchar(50)) on commit preserve rows;
insert into tmp values('mortecai ali von allen o''shae');
insert into tmp values('hillary rodham-clinton');
insert into tmp values('cher');
insert into tmp values('a.e. schatzschneider');

select str1
,length(str1)-length(oreplace(str1,' ','')) as occurrence
,(1-ABS(occurrence-0.1)/(occurrence-0.1))/2 
as if_occurence_is_0_return_1  
-- this just to handle the case that there are no spaces in the string at all
-- in the case of no spaces, assumes whole field is just last name
,occurrence+if_occurence_is_0_return_1
,instr(str1,' ',1,occurrence+if_occurence_is_0_return_1) as lastspace
,substr(str1,1,lastspace) as first_nm
,substr(str1,lastspace,length(str1)-lastspace+1) as last_nm
from tmp;

--pulling it all together 
--(just str1, first_nm & last_nm - no intermediate placeholder fields):
select str1
,substr(str1,1,instr(str1,' ',1,length(str1)-length(oreplace(str1,' ',''))
+(1-ABS(length(str1)-length(oreplace(str1,' ',''))-0.1)/(length(str1)
-length(oreplace(str1,' ',''))-0.1))/2)) as first_nm
,substr(str1,instr(str1,' ',1,length(str1)-length(oreplace(str1,' ',''))
+(1-ABS(length(str1)-length(oreplace(str1,' ',''))-0.1)/(length(str1)
-length(oreplace(str1,' ',''))-0.1))/2),length(str1)-instr(str1,' ',1,length(str1)
-length(oreplace(str1,' ',''))+(1-ABS(length(str1)
-length(oreplace(str1,' ',''))-0.1)/(length(str1)
-length(oreplace(str1,' ',''))-0.1))/2)+1) as last_nm
from tmp;

Answer 1

当你使用INSTR时，你可能在TD14上。

你应该检查INSTR的参数，你也可以从后面搜索： - ）

trim(substring(str1 from instr(str1,' ',-1,1))) as last_nm

TRIM摆脱了领先的空白。

第一个名字是

trim(substring(str1 from 1 for instr(str1,' ',-1,1))) as first_nm,

当然你也可以使用正则表达式：

REGEXP_SUBSTR(str1, '[^ ]+$') as lst_nm,
REGEXP_SUBSTR(str1, '.*[ ]') as first_nm

根据最后出现的分隔符解析字符串（在本例中为空格）

1 个答案: