根据最后出现的分隔符解析字符串(在本例中为空格)

时间:2014-06-24 18:25:26

标签: sql teradata

如何解析字符串 - 最后一个分隔符。

在Teradata中,我将名称数据存储在varchar列中。我不知道名字可以有多长,或者它可以有多少件:给定名字,潜在的多个中间名(或没有中间名),姓氏等。

我想解析字符串,假设名称中最后一个空格后的所有内容都是姓氏。任何人都有比我更好的想法吗?

这是我的解决方案: (这是Hack-y,但它有效,并且避免了递归,循环,udfs等。)

drop table tmp;
create volatile table tmp (str1 varchar(50)) on commit preserve rows;
insert into tmp values('mortecai ali von allen o''shae');
insert into tmp values('hillary rodham-clinton');
insert into tmp values('cher');
insert into tmp values('a.e. schatzschneider');

select str1
,length(str1)-length(oreplace(str1,' ','')) as occurrence
,(1-ABS(occurrence-0.1)/(occurrence-0.1))/2 
as if_occurence_is_0_return_1  
-- this just to handle the case that there are no spaces in the string at all
-- in the case of no spaces, assumes whole field is just last name
,occurrence+if_occurence_is_0_return_1
,instr(str1,' ',1,occurrence+if_occurence_is_0_return_1) as lastspace
,substr(str1,1,lastspace) as first_nm
,substr(str1,lastspace,length(str1)-lastspace+1) as last_nm
from tmp;

--pulling it all together 
--(just str1, first_nm & last_nm - no intermediate placeholder fields):
select str1
,substr(str1,1,instr(str1,' ',1,length(str1)-length(oreplace(str1,' ',''))
+(1-ABS(length(str1)-length(oreplace(str1,' ',''))-0.1)/(length(str1)
-length(oreplace(str1,' ',''))-0.1))/2)) as first_nm
,substr(str1,instr(str1,' ',1,length(str1)-length(oreplace(str1,' ',''))
+(1-ABS(length(str1)-length(oreplace(str1,' ',''))-0.1)/(length(str1)
-length(oreplace(str1,' ',''))-0.1))/2),length(str1)-instr(str1,' ',1,length(str1)
-length(oreplace(str1,' ',''))+(1-ABS(length(str1)
-length(oreplace(str1,' ',''))-0.1)/(length(str1)
-length(oreplace(str1,' ',''))-0.1))/2)+1) as last_nm
from tmp;

1 个答案:

答案 0 :(得分:1)

当你使用INSTR时,你可能在TD14上。

你应该检查INSTR的参数,你也可以从后面搜索: - )

trim(substring(str1 from instr(str1,' ',-1,1))) as last_nm

TRIM摆脱了领先的空白。

第一个名字是

trim(substring(str1 from 1 for instr(str1,' ',-1,1))) as first_nm,

当然你也可以使用正则表达式:

REGEXP_SUBSTR(str1, '[^ ]+$') as lst_nm,
REGEXP_SUBSTR(str1, '.*[ ]') as first_nm