我正在使用NUnit v2.5来比较包含复合Unicode字符的字符串 尽管比较本身运作良好,但表明第一个差异的插入符号似乎是错误的。
UPD:我最终被覆盖的EqualConstraint
反过来调用了自定义TextMessageWriter
,所以我不再需要答案了。请参阅下面的解决方案。
以下是片段:
string s1 = "ใช้งานง่าย";
string s2 = "ใช้งานงาย";
Assert.That(s1, Is.EqualTo(s2));
这是输出:
Expected: "ใช้งานงาย"
But was: "ใช้งานง่าย"
------------------^
表示第一个不同角色的箭头似乎偏离了2个位置(上面有多个音标)。对于更长的琴弦,它会变得非常痛苦
我试过了String.Normalize()
,但也不行。
我如何克服这个问题?感谢您的帮助。请参阅下面的答案。
答案 0 :(得分:1)
在比较Unicode字符串时,必须始终对比较的两边进行规范化,并采用相同的方式。对s1
和s2
进行二进制比较是不够的,因为规范等效的字符串不会测试二进制等价。
假设存在四个平凡规范化函数,四个规范化形式中的每一个都有一个,您可能希望将NFD(s1)
测试二元方程到NFD(s2)
。在那里使用NFD
或NFC
并不重要,但您必须对两个字符串执行相同的操作。
对于k-compat函数,NFKD和NFKD,这些在进行字符串搜索时很有用,因为它们以某种精度为代价改善了调用。例如,NFKD("™")
将等于NFKD("TM")
。这就是Adobe Reader所做的事情,例如,当您对文档进行搜索时:它始终以k-compat模式运行搜索,以便您的搜索更有可能查找内容。但是,与NFC
和NFD
不同,k-compat函数NFKC
和NFKD
会丢失信息,并且不可逆。但是,使用简单的NFD
和NFC
,您可以随时返回到另一个。
答案 1 :(得分:0)
您应该能够使用this answer中的代码将每个字符串转换为原始字符串的转义版本。复合字符将成为单个转义的\u
代码点,而组合字符将是一系列此类转义。然后在这些转义版本的字符串上运行Assert
。
答案 2 :(得分:0)
我想我找不到更好的答案,所以回答我自己的问题。
<强>原因。强>
有许多语言使用非间距修饰符表示字符。对于欧洲语言,有替代品,例如"u" (U+0075) + "¨" (U+00A8) = "ü" (U+00FC)
。在这种情况下,@ tchrist的解决方案已经足够了。
然而,对于复杂的书写系统,没有替代非间距修饰符。因此,NUnit的TextMessageWriter.WriteCaretLine(int mismatch)
将mismatch
参数视为字节偏移,而泰语字符串的屏幕表示可能更短而不是长度插入符号行("-----^"
)。
<强>解。强>
强制WriteCaretLine(int mismatch)
遵守非间距修改器,将mismatch
值减少为此偏移之前发生的非间距修改器的数量。
实现所有实际需要的补充类,只是为了调用新代码。
与泰国人一起,我用梵文和西藏人进行了测试。它按预期工作。
又一个陷阱。如果您像我一样通过ReSharper在Visual Studio中使用NUnit,则必须配置Internet Explorer的字体(无法使用R#进行管理),以便它为Thai,Devanagari使用正确的等宽字体,等
<强>实现。强>
TextMessageWriter
并覆盖其DisplayStringDifferences
; ClipExpectedAndActual
和FindMismatchPosition
- 这里是非间距修饰符得到尊重;需要适当的削波,因为它也可能影响非间距元素的计算。EqualConstraint
并覆盖其WriteMessageTo(MessageWriter writer)
,以便使用您的MessageWriter; 源代码如下。大约80%的代码没有做任何有用的事情,但由于原始代码中的访问级别而将其包括在内。
// Step 1.
public class ThaiMessageWriter : TextMessageWriter
{
/// <summary>
/// This method is merely a copy of the original method taken from NUnit sources,
/// except that it changes meaning of <paramref name="mismatch"/> before the caret line is displayed.
/// <remarks>
/// Originally passed <paramref name="mismatch"/> contains byte offset, while proper display of caret requires
/// it position to be calculated in character placeholder units. They are different in case of
/// over- or under-string Unicode characters like acute mark or complex script (Thai)
/// </remarks>
/// </summary>
/// <param name="clipping"></param>
public override void DisplayStringDifferences(string expected, string actual, int mismatch, bool ignoreCase, bool clipping)
{
// Maximum string we can display without truncating
int maxDisplayLength = MaxLineLength
- PrefixLength // Allow for prefix
- 2; // 2 quotation marks
int mismatchOffset = mismatch;
if (clipping)
MsgUtils2.ClipExpectedAndActual(ref expected, ref actual, maxDisplayLength, mismatchOffset);
expected = MsgUtils.EscapeControlChars(expected);
actual = MsgUtils.EscapeControlChars(actual);
// The mismatch position may have changed due to clipping or white space conversion
int mismatchInCharPlaceholders = MsgUtils2.FindMismatchPosition(expected, actual, 0, ignoreCase);
Write(Pfx_Expected);
WriteExpectedValue(expected);
if (ignoreCase)
WriteModifier("ignoring case");
WriteLine();
WriteActualLine(actual);
//DisplayDifferences(expected, actual);
if (mismatch >= 0)
WriteCaretLine(mismatchInCharPlaceholders);
}
// Copied due to private
/// <summary>
/// Write the generic 'Actual' line for a constraint
/// </summary>
/// <param name="constraint">The constraint for which the actual value is to be written</param>
private void WriteActualLine(Constraint constraint)
{
Write(Pfx_Actual);
constraint.WriteActualValueTo(this);
WriteLine();
}
// Copied due to private
/// <summary>
/// Write the generic 'Actual' line for a given value
/// </summary>
/// <param name="actual">The actual value causing a failure</param>
private void WriteActualLine(object actual)
{
Write(Pfx_Actual);
WriteActualValue(actual);
WriteLine();
}
// Copied due to private
private void WriteCaretLine(int mismatch)
{
// We subtract 2 for the initial 2 blanks and add back 1 for the initial quote
WriteLine(" {0}^", new string('-', PrefixLength + mismatch - 2 + 1));
}
}
// Step 2.
public static class MsgUtils2
{
private static readonly string ELLIPSIS = "...";
/// <summary>
/// Almost a copy of MsgUtil.ClipExpectedAndActual method
/// </summary>
/// <param name="expected"></param>
/// <param name="actual"></param>
/// <param name="maxDisplayLength"></param>
/// <param name="mismatch"></param>
public static void ClipExpectedAndActual(ref string expected, ref string actual, int maxDisplayLength, int mismatch)
{
// Case 1: Both strings fit on line
int maxStringLength = Math.Max(expected.Length, actual.Length);
if (maxStringLength <= maxDisplayLength)
return;
// Case 2: Assume that the tail of each string fits on line
int clipLength = maxDisplayLength - ELLIPSIS.Length;
int clipStart = maxStringLength - clipLength;
// Case 3: If it doesn't, center the mismatch position
if (clipStart > mismatch)
clipStart = Math.Max(0, mismatch - clipLength / 2);
// shift both clipStart and maxDisplayLength if they split non-placeholding symbol
AdjustForNonPlaceholdingCharacter(expected, ref clipStart);
AdjustForNonPlaceholdingCharacter(expected, ref maxDisplayLength);
expected = MsgUtils.ClipString(expected, maxDisplayLength, clipStart);
actual = MsgUtils.ClipString(actual, maxDisplayLength, clipStart);
}
private static void AdjustForNonPlaceholdingCharacter(string expected, ref int index)
{
while (index > 0 && CharUnicodeInfo.GetUnicodeCategory(expected[index]) == UnicodeCategory.NonSpacingMark)
{
index--;
}
}
static public int FindMismatchPosition(string expected, string actual, int istart, bool ignoreCase)
{
int length = Math.Min(expected.Length, actual.Length);
string s1 = ignoreCase ? expected.ToLower() : expected;
string s2 = ignoreCase ? actual.ToLower() : actual;
int iSpacingCharacters = 0;
for (int i = 0; i < istart; i++)
{
if (CharUnicodeInfo.GetUnicodeCategory(s1[i]) != UnicodeCategory.NonSpacingMark)
iSpacingCharacters++;
}
for (int i = istart; i < length; i++)
{
if (s1[i] != s2[i])
return iSpacingCharacters;
if (CharUnicodeInfo.GetUnicodeCategory(s1[i]) != UnicodeCategory.NonSpacingMark)
iSpacingCharacters++;
}
//
// Strings have same content up to the length of the shorter string.
// Mismatch occurs because string lengths are different, so show
// that they start differing where the shortest string ends
//
if (expected.Length != actual.Length)
return length;
//
// Same strings : We shouldn't get here
//
return -1;
}
}
// Step 3.
public class ThaiEqualConstraint : EqualConstraint
{
private readonly string _expected;
// WTF expected is private?
public ThaiEqualConstraint(string expected) : base(expected)
{
_expected = expected;
}
public override void WriteMessageTo(MessageWriter writer)
{
// redirect output to customized MessageWriter
var myMessageWriter = new ThaiMessageWriter();
base.WriteMessageTo(myMessageWriter);
writer.Write(myMessageWriter);
}
}
// Step 4.
public static class ThaiText
{
public static EqualConstraint IsEqual(string expected)
{
return new ThaiEqualConstraint(expected);
}
}