Question

我想检索一行中每行中字母的差异。例如

如果您有一个值“test”而另一行的值为“testing”，则差异为“test”和“testing”之间的4个字母。该列的数据值为4

I have reflected about it and I don't know where to begin

id    ||  value     || category   || differences 
--------------------------------------------------
 1    ||  test      || 1          || 4
 2    ||  testing  || 1          || null   
11    ||  candy     || 2          || -3       
12    ||  ca        || 2          || null

在这种情况和背景下，“测试”和“休息”之间没有区别。

Answer 1

我认为你所寻找的是edit difference的衡量标准，而不仅仅是计算前缀相似度，其中有一些常见的算法。 Levenshtein's method是我之前使用的，我已经看到它实现为TSQL函数。 this SO question的答案提出了TSQL中的一些实现，您可能只能按原样使用它们。

^{（虽然花时间测试代码并理解方法，而不仅仅是复制代码并使用它，以便在出现问题时你可以理解输出 - 否则你可能会产生一些技术债务你以后必须还钱吗？}

你想要的距离计算方法究竟取决于你想要如何计算某些东西，例如你是将一个替换计为一个更改，还是删除和一个插入，如果你的字符串足够长，它是否重要你想考虑子串移动，等等。

Answer 2

我认为您只想要len()和lead()：

select t.id, t.value, t.category,
       (len(lead(value) over (partition by t.category order by t.id) -
        len(value)
       ) as difference
from t;

Answer 3

create table #temp
(
id int,
value varchar(30),
category int
)

insert into #temp
select 1,'test',1
union all
select 2,'testing',1
union all
select 1,'Candy',2
union all
select 2,'Ca',2

;with cte
as
(
select id,value,category,lead(value) over (partition by category order by id) as nxtvalue
from #temp
)
select id,value,category,len(replace(nxtvalue,value,'')) as differences
from cte

Answer 4

您使用LEAD阅读了下一条记录。然后将字符串与LIKE或其他字符串函数进行比较：

select
  id, value, category,
  case when value like next_value + '%' or next_value like value + '%' 
       then len(next_value) - len(value)
  end as differences
from
(
  select id, value, category, lead(value) over (order by id) as next_value 
  from mytable
) this_and_next;

如果您只想比较同一类别中的值，请使用分区子句：

lead(value) over (partition by category order by id)

更新：请参阅SQL Server LEN上的 DhruvJoshi的答案。正如我所假设的那样，这个函数不计算尾随空白，所以你需要他的技巧以防你想要计算它们。以下是LEN确认此行为的文档：https://technet.microsoft.com/en-us/library/ms190329(v=sql.105).aspx

Answer 5

你也可以使用如下的自加入查询：

--create table tbl (id int,  value nvarchar(100), category int);
--insert into tbl values
--(1,N'test',1)
--,(2,N' testing',1)
--,(11,N'candy',2)      
--,(12,N'ca',2);
select A.*, LEN(B.value)-LEN(A.value) as difference
from tbl A LEFT JOIN tbl B on A.id +1 =B.id and A.category=B.category
--drop table tbl

更新：我注意到你最后奇怪地定位了这个空间。 SQL Server大多数时候在计算长度时不计算尾随空格。所以这是上面查询的黑客

select A.*, LEN(B.value+'>')-LEN(A.value+'>') as difference
from tbl A LEFT JOIN tbl B on A.id +1 =B.id and A.category=B.category

正如评论中所指出的，在这种情况下，Id可能不会是连续的试试这个：

create table #temp ( rownum int PRIMARY KEY IDENTITY(1,1), id int, value nvarchar(100), category int)
insert into #temp (id, value, category)
select id, value, category from tbl order by id asc


    select A.id, A.value, A.category, LEN(B.value+'>')-LEN(A.value+'>') as difference
    from #temp A LEFT JOIN #temp B on A.rownum +1 =B.rownum and A.category=B.category

列中的差异数

5 个答案: