Question

我需要转换一个150个字符的自由格式文本并将其映射到两个文本之一：SQL Server数据库的member_type字段中的兄弟或配偶。

我想出了以下更新语句来执行更新：

Update my_table
set member_type = CASE
      WHEN (relationship_description like 'brother%' OR
            relationship_description like 'sister%' OR
            relationship_description like  'sibling%' 

      THEN 'Sibling' 

      WHEN ( relationship_description like 'spouse%' OR
             relationship_description like 'husband%' OR
             relationship_description like 'wife%' OR
       ) THEN 'Spouse' 
     ELSE '' END;

但还有一个要求：如果relationship_description有多个关键字，请使用第一个关键字进行转换。

例如：案例1：relationship_description =“马克的兄弟”：其中包含“兄弟”，因此将被视为兄弟姐妹。

case2：relationship_description =“Walter是Mark和Greg Howard的兄弟.Mark是Julie的丈夫”：首先出现的是兄弟所以将被视为兄弟姐妹。

case3：relationship_description =“John's Wife”：包含关键字wife，因此应被视为配偶

案例4：relationship_description =“约翰的妻子和彼得的妹妹”：包含2个关键词妻子和妹妹，所以应该考虑将妻子视为配偶。

我开始知道有一个可能有效的SQL关键字STUFF。有人可以帮忙吗？我需要通过SQL脚本而不是Java来实现。

Answer 1

如果您想知道哪一个首先出现，您需要patindex。

我建议根据源表的键创建一个CTE，为您要搜索的每个字符串获取PATINDEX值。总之如下......

WITH Relations as
( SELECT 'brother' as searchstring, 'Sibling' as relationship
    UNION
    SELECT 'sister' as searchstring, 'Sibling' as relationship
    UNION
    SELECT 'sibling' as searchstring, 'Sibling' as relationship
    UNION
    SELECT 'spouse' as searchstring, 'Spouse' as relationship
    UNION
    SELECT 'husband' as searchstring, 'Spouse' as relationship
    UNION
    SELECT 'wife' as searchstring, 'Spouse' as relationship
)
,
FoundPat as
( SELECT my.keyfield, r.relationship, 
    RANK() OVER (PARTITION BY my.keyfield ORDER BY PATINDEX(r.searchstring, my.relationship_description)) as positionrank
    FROM my_table my
    cross apply Relations r
)


Update my_table
set member_type = ISNULL(fp.relationship,'')
from my_table my
left join FoundPat fp on fp.keyfield = my.keyfield and positionrank = 1

我没有测试过上面的代码，所以有些语法可能会稍微关闭

Answer 2

尝试使用pathindex功能：

declare @t table(relationship_description varchar(max), member_type varchar(10))
insert into @t values
('Mark''s brother in', null),
('Walter is brother of Mark and Greg Howard. Mark is the husband of Julie', null),
('John''s Wife', null),
('Johns''s wife and Peter''s sister', null)

update t
    set member_type = case when m = 0 
                           then null else 
                                    case when ca.t = 1 
                                         then 'Sibling' 
                                         else 'Spouse' 
                                    end 
                      end
from @t t
cross apply (select top 1 *
             from (values
             (PATINDEX('%brother%', relationship_description), 1), 
             (PATINDEX('%sister%', relationship_description), 1), 
             (PATINDEX('%sibling%', relationship_description), 1), 
             (PATINDEX('%spouse%', relationship_description), 2), 
             (PATINDEX('%husband%', relationship_description), 2), 
             (PATINDEX('%wife%', relationship_description), 2)) t(m, t)
             order by ROW_NUMBER() over(order by case when m = 0 then 1000000 else m end))ca

select * from @t

在cross apply中，您将找到首次出现的关键词的索引。然后你只需选择不同于0的最小索引（0表示没有出现，所以我用1000000标记它）。在主要情况下，您只是验证如果没有任何单词出现，那么只需选择NULL，否则查看出现类型并选择适当的描述。

输出：

relationship_description                                                  member_type
Mark's brother                                                            Sibling
Walter is brother of Mark and Greg Howard. Mark is the husband of Julie   Sibling
John's Wife                                                               Spouse
Johns's wife and Peter's sister                                           Spouse

基于第一次出现另一列中的文本来更新列

2 个答案: