我在C#中编写Levenshtein距离函数来计算两个字符串之间的编辑距离。问题是我想使用不同的排序规则多次调用该方法,但只有一个排序规则可以跨越SQL到CLR接口 - 这是数据库的默认排序规则。
以下是CLR功能的代码:
[SqlFunction(IsDeterministic = true, Name = "LevenshteinDistance")]
public static SqlInt64 Distance(SqlString textA, SqlString textB)
{
// get a collation-aware comparer so string/character comparisons
// will match the inputs' specified collation
var aCompareInfo = textA.CompareInfo;
var compareOptions = ConvertCompareOptions(textA.SqlCompareOptions);
var aLength = textA.Value.Length;
var bLength = textB.Value.Length;
// degenerate cases
if (aCompareInfo.Compare(textA.Value, 0, aLength, textB.Value, 0, bLength, compareOptions) == 0) { return 0; }
if (aLength == 0) { return bLength; }
if (bLength == 0) { return aLength; }
// create two work vectors of integer distances
var previousDistances = new SqlInt64[Maximum(aLength, bLength) + 1];
var currentDistances = new SqlInt64[Maximum(aLength, bLength) + 1];
// initialize previousDistances (the previous row of distances)
// this row is A[0][i]: edit distance for an empty textA
// the distance is just the number of characters to delete from textB
for (var i = 0; i < previousDistances.Length; i++)
{
previousDistances[i] = i;
}
for (var i = 0; i < aLength; i++)
{
// calculate currentDistances from the previous row previousDistances
// first element of currentDistances is A[i+1][0]
// edit distance is delete (i+1) chars from textA to match empty textB
currentDistances[0] = i + 1;
// use formula to fill in the rest of the row
for (var j = 0; j < bLength; j++)
{
var cost = (aCompareInfo.Compare(textA.Value, i, 1, textB.Value, j, 1, compareOptions) == 0) ? 0 : 1;
currentDistances[j + 1] = Minimum(currentDistances[j] + 1, previousDistances[j + 1] + 1, previousDistances[j] + cost);
}
// copy currentDistances to previousDistances for next iteration
for (var j = 0; j < previousDistances.Length; j++)
{
previousDistances[j] = currentDistances[j];
}
}
return currentDistances[bLength];
}
将CLR程序集部署到SQL Server(2008 R2)并将其调用为:
print dbo.LevenshteinDistance('abc' collate Latin1_General_CI_AI, 'ABC' collate Latin1_General_CI_AI)
print dbo.LevenshteinDistance('abc' collate Latin1_General_CS_AS_KS_WS, N'ABC' collate Latin1_General_CS_AS_KS_WS)
两个调用都返回零(0)。因为我为第二次调用指定了区分大小写的排序规则,所以我希望第二次调用返回三(3)。
在SQL Server中使用CLR函数,是否可以指定除数据库默认值之外的排序规则并在CLR函数中使用它们?如果是这样,怎么样?
答案 0 :(得分:2)
如果您的解决方案不涉及大于4K的字符串,还有一些方法可能会考虑更好。使您的数据类型&#39;对象&#39;而不是SqlString。这相当于SQL_VARIANT。虽然变体比标准类型产生更多的开销,但它们可以保存任意排序的字符串。
SELECT dbo.ClrCollationTest(N'Anything' collate latin1_general_cs_as),
dbo.ClrCollationTest(N'Anything' collate SQL_Latin1_General_CP1_CI_AS);
当CLR编码时,上面分别返回0和1:
public static SqlBoolean ClrCollationTest(object anything)
{
if (anything is SqlString)
return new SqlBoolean(((SqlString)anything).SqlCompareOptions.HasFlag(SqlCompareOptions.IgnoreCase));
else throw new ArgumentException(anything.GetType().Name + " is not a valid parameter data type. SqlString is required.");
}
答案 1 :(得分:1)
在Internet上没有看到任何替代方案或对此问题的回答,我决定将所需的排序规则属性指定为函数参数,并根据输入或默认排序规则选择CultureInfo
对象和CompareOptions
从数据库传入。
[SqlFunction(IsDeterministic = true, Name = "LevenshteinDistance")]
public static SqlInt64 Distance(SqlString textA, SqlString textB, int? lcid, bool? caseInsensitive, bool? accentInsensitive, bool? kanaInsensitive, bool? widthInsensitive)
{
// get a collation-aware comparer so string/character comparisons
// will match the inputs' specified collation
//var aCompareInfo = textA.CompareInfo;
var aCompareInfo = CultureInfo.GetCultureInfo(lcid ?? textA.LCID).CompareInfo;
//var compareOptions = ConvertCompareOptions(textA.SqlCompareOptions);
var compareOptions = GetCompareOptions(caseInsensitive, accentInsensitive, kanaInsensitive, widthInsensitive);
// ... more code ...
// first comparison
if (aCompareInfo.Compare(textA.Value, 0, aLength, textB.Value, 0, bLength, compareOptions) == 0) { return 0; }
// ... more code ...
var cost = (aCompareInfo.Compare(textA.Value, i, 1, textB.Value, j, 1, compareOptions) == 0) ? 0 : 1;
// ... more code ...
}
private static CompareOptions GetCompareOptions(bool? caseInsensitive, bool? accentInsensitive, bool? kanaInsensitive, bool? widthInsensitive)
{
var compareOptions = CompareOptions.None;
compareOptions |= (caseInsensitive ?? false) ? CompareOptions.IgnoreCase : CompareOptions.None;
compareOptions |= (accentInsensitive ?? false) ? CompareOptions.IgnoreNonSpace : CompareOptions.None;
compareOptions |= (kanaInsensitive ?? false) ? CompareOptions.IgnoreKanaType : CompareOptions.None;
compareOptions |= (widthInsensitive ?? false) ? CompareOptions.IgnoreWidth : CompareOptions.None;
return compareOptions;
}
更新我的程序集和UDF声明后,我可以像这样调用函数:
print dbo.LevenshteinDistance('abc', 'ABC', null, 1, 1, 1, 1)
print dbo.LevenshteinDistance('abc', 'ABC', null, 0, 0, 0, 0)
现在第一个调用返回0(数据库默认文化,一切都是Insensitive),而第二个调用返回3(数据库默认文化,一切都敏感)。
答案 2 :(得分:1)
如何在CLR函数中获取SQL String的排序规则?
不幸的是你做不到。根据{{3}}的TechNet页面,在“参数整理”部分中:
当您创建公共语言运行时(CLR)例程,并且绑定到例程的CLR方法的参数的类型为 SqlString 时,SQL Server将使用默认排序规则创建参数的实例包含调用例程的数据库。如果参数不是 SqlType (例如,字符串而不是 SqlString ),则数据库中的排序规则信息与参数无关
因此,您目睹的关于CompareInfo
输入参数的SqlCompareOptions
和textA
属性的行为虽然令人遗憾/令人沮丧/难以理解,但至少与文档说明如何一致系统应该正常工作。
因此,您通过单独的输入参数传递属性的解决方案是可行的方法(尽管真的应该使用SqlInt32
和SqlBoolean
的SqlTypes; - 。)