检查另一列值(SQL)中是否包含一个列值?

时间:2010-08-03 07:50:38

标签: tsql substring

嘿,我有2个包含很多列的表,我想找到table1.somecolumn中包含table1.somecolumn值的那些行。例如:

table1.somecolumn有史密斯,彼得
table2.someothercolumn有 peter.smith

这应该是匹配,我该怎么做这样的搜索?

谢谢:)

2 个答案:

答案 0 :(得分:2)

您可以尝试SOUNDEXDIFFERENCE函数来帮助匹配字符串文字。

示例:

select difference('peter.green', 'Green, Peter')

返回2,其中:

  

返回的整数是   SOUNDEX值中的字符   是相同的。返回值范围   从0到4:0表示弱或   没有相似之处,4表示强烈   相似性或相同的值。

请参阅MSDN上的SOUNDEXDIFFERENCE主题。

<强>更新

Soundex&amp;在考虑单词的顺序时,差异可能无法正常运行,但如果您安装了全文索引功能,则无需创建索引即可使用全文引擎的分词和解析功能。假设您正在使用SQL Server 2008,以下函数将返回一个标准化术语列表:

SELECT * FROM sys.dm_fts_parser('"Peter Green"', 1033, 0, 0)

您可以通过CROSS APPLY查询剩余的查询。

请参阅sys.dm_fts_parser主题&amp;部分K.在FROM主题中使用Apply获取更多信息。

示例:(启用了全文引擎的SQL Server Enterprise 2008)

if not OBJECT_ID('Names1', 'Table') is null drop table names1
if not OBJECT_ID('Names2', 'Table') is null drop table names2

create table Names1 
(
    id int identity(0, 1),
    name nvarchar(128)
)
insert into Names1 (name) values ('Green, Peter')
insert into Names1 (name) values ('Smith, Peter')
insert into Names1 (name) values ('Aadland, Beverly')
insert into Names1 (name) values ('Aalda, Mariann')
insert into Names1 (name) values ('Aaliyah')
insert into Names1 (name) values ('Aames, Angela')
insert into Names1 (name) values ('Aames, Willie')
insert into Names1 (name) values ('Aaron, Caroline')
insert into Names1 (name) values ('Aaron, Quinton')
insert into Names1 (name) values ('Aaron, Victor')
insert into Names1 (name) values ('Abbay, Peter')
insert into Names1 (name) values ('Abbott, Dorothy')
insert into Names1 (name) values ('Abbott, Bruce')
insert into Names1 (name) values ('Abbott, Bud')
insert into Names1 (name) values ('Abbott, Philip')
insert into Names1 (name) values ('Abdoo, Rose')
insert into Names1 (name) values ('Abdul, Paula')
insert into Names1 (name) values ('Abel, Jake')
insert into Names1 (name) values ('Abel, Walter')
insert into Names1 (name) values ('Abeles, Edward')
insert into Names1 (name) values ('Abell, Tim')
insert into Names1 (name) values ('Aber, Chuck')

create table Names2
(
    id int identity(200, 1),
    name nvarchar(128)
)
insert into Names2 (name) values (LOWER('Peter.Green'))
insert into Names2 (name) values (LOWER('Peter.Smith'))
insert into names2 (name) values (LOWER('Beverly.Aadland'))
insert into names2 (name) values (LOWER('Mariann.Aalda'))
insert into names2 (name) values (LOWER('Aaliyah'))
insert into names2 (name) values (LOWER('Angela.Aames'))
insert into names2 (name) values (LOWER('Willie.Aames'))
insert into names2 (name) values (LOWER('Caroline.Aaron'))
insert into names2 (name) values (LOWER('Quinton.Aaron'))
insert into names2 (name) values (LOWER('Victor.Aaron'))
insert into names2 (name) values (LOWER('Peter.Abbay'))
insert into names2 (name) values (LOWER('Dorothy.Abbott'))
insert into names2 (name) values (LOWER('Bruce.Abbott'))
insert into names2 (name) values (LOWER('Bud.Abbott'))
insert into names2 (name) values (LOWER('Philip.Abbott'))
insert into names2 (name) values (LOWER('Rose.Abdoo'))
insert into names2 (name) values (LOWER('Paula.Abdul'))
insert into names2 (name) values (LOWER('Jake.Abel'))
insert into names2 (name) values (LOWER('Walter.Abel'))
insert into names2 (name) values (LOWER('Edward.Abeles'))
insert into names2 (name) values (LOWER('Tim.Abell'))
insert into names2 (name) values (LOWER('Chuck.Aber'));

with ftsNamesFirst (id, term) as
(
    select id, terms.display_term
        from names1 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms
), ftsNamesSecond (id, term) as
(
select id, terms.display_term
        from names2 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms
)
select * from 
(
    select 
    ROW_NUMBER() over (partition by nfirst.id order by sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) desc) ranking,
    sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) Confidence,
    nFirst.id Names1ID,
    nFirst.name Names1Name, 
    nSecond.id Names2ID,
    nSecond.name Names2Name
    from 
    ftsNamesFirst cross join ftsNamesSecond 
    left outer join names1 nFirst on nFirst.id = ftsNamesFirst.id
    left outer join names2 nSecond on nSecond.id = ftsNamesSecond.id 
    where DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term) = 4
    group by 
        nFirst.id, nFirst.name, nSecond.id, nSecond.name
) MatchedNames 
where ranking = 1

<强>输出:

具有最高置信度的匹配优先(使用窗口排名查询过滤掉所有其他匹配)。

Confidence Names1ID Names1Name Names2ID Names2Name
8   0   Green, Peter    200 peter.green
8   1   Smith, Peter    201 peter.smith
8   2   Aadland, Beverly    202 beverly.aadland
8   3   Aalda, Mariann  203 mariann.aalda
4   4   Aaliyah 204 aaliyah
8   5   Aames, Angela   205 angela.aames
8   6   Aames, Willie   206 willie.aames

这并不完美,但这是一个很好的起点,可以通过调整来提高成功率。

答案 1 :(得分:1)

根据您的需要,有几种可能的解决方案: 使用可以创建辅助表来存储每个记录的关键字

  1. 使用辅助表存储每个记录或记录和字段的关键字。例如。 table_helper(id int primary key,record_id int,keyword varchar),record_id - 指向源表的链接。在table1,table2的触发器中填充此表。查找公共行的查询是table_helper与其自身的简单交集。您可以为table1和table2创建一个帮助程序,或使用单独的表。
  2. 使用全文索引。