Can string.Compare ever return 0 for genuinely unequal strings?

时间:2015-05-12 23:22:21

标签: c# string string-comparison

I was thinking about the mathematics of how CREATE PROC dbo.GetOrders @UserID INT = 2 AS DECLARE @SQLString NVARCHAR(MAX) SET @SQLString = N'SELECT * FROM dbo.Orders WHERE UserID = @UserID' EXEC sys.sp_executesql @SQLString, N'@UserID int', @UserID= @UserID works in C#.

Is it possible for two unequal strings to ever return 0 on this method call?

I'm referring to strings that are genuinely unequal such as "Herp" and "Derp", not "Herp" and "Hěrp

Unfortunately, apart from the basic null cases, the source code for string.Compare is all internal stuff - outside of .NET.

I believe this is the actual C++ code used for this, but it is difficult to be sure.

The cases I'm considering:

  • strange ordinal behavior (just permutations of strings that end up being equal)
  • Overflowing an integer, causing a positive and negative number for the comparisons, resulting in a 0
  • Anything else crazy someone more versed in the string.Compare() implementations than I am

There isn't a specific reason for asking this - just curiosity. And I hadn't seen it asked before for C#!

1 个答案:

答案 0 :(得分:4)

I believe that the answer to your question is technically yes, depending on which overload you call, and which option parameters you pass in. According to the MSDN docs it is possible to do the comparison with a Culture that has strange rules for ordinal values of characters, or even skips certain characters:

Notes to Callers

Character sets include ignorable characters. The Compare(String, String) method does not consider such characters when it performs a culture-sensitive comparison. For example, if the following code is run on the .NET Framework 4 or later, a culture-sensitive comparison of "animal" with "ani-mal" (using a soft hyphen, or U+00AD) indicates that the two strings are equivalent.

If you want to ignore Culture and just compare the raw values of 2 strings, you can call the overload String.Compare(s1, s2, StringComparison.OrdinalIgnoreCase). This should result in essentially a byte-by-byte comparison. Docs:

Notes to Callers ... To recognize ignorable characters in your comparison, supply a value of StringComparison.Ordinal or OrdinalIgnoreCase for the comparisonType parameter.

Note that the definition of "greater" or "lesser" strings is not necessarily obvious. For example, is string "abc" greater or lesser than "abcc"? .NET is pretty clear that it is lesser for the purposes of string comparison. But it's good to read the docs carefully before relying on such edge cases:

The comparison terminates when an inequality is discovered or both strings have been compared. However, if the two strings compare equal to the end of one string, and the other string has characters remaining, the string with remaining characters is considered greater. The return value is the result of the last comparison performed.