Question

我使用的是VSTS2008 + C＃+ .Net 3.0。这是我的代码和来自ADO.Net的相关例外。这是我输入的二进制形式和文本形式的两个字符串，任何想法有什么不对？为什么ADO.Net将两个不同的字符串视为相同？

异常消息：

An unhandled exception of type 'System.Data.ConstraintException' occurred in System.Data.dll

Additional information: Column 'Name' is constrained to be unique.  Value '������' is already present.

以二进制形式和文本形式输入字符串：

StackOverflow无法正确显示我的字符串代码，这里是我在VSTS 2008编辑器中实际显示内容的屏幕快照。

我的代码：

    static void Main(string[] args)
    {
        string[] buf = new string[] { "����", "������" };

        CompareInfo ci = System.Globalization.CultureInfo.InvariantCulture.CompareInfo;
        ci.Compare(buf[0], buf[1], CompareOptions.IgnoreWidth);
        Console.WriteLine (String.Compare(buf[0], buf[1], StringComparison.InvariantCultureIgnoreCase));

        DataTable bulkInserTable = new DataTable("BulkRequestTable");
        bulkInserTable.CaseSensitive = true;
        DataColumn column = null;
        DataRow row = null;

        // add Keyword column to datatable
        column = new DataColumn();
        column.DataType = System.Type.GetType("System.String");
        column.ColumnName = "Name";
        column.ReadOnly = true;
        column.Unique = true;
        bulkInserTable.Columns.Add(column);

        foreach (string item in buf)
        {
            row = bulkInserTable.NewRow();
            row["Name"] = item;
            bulkInserTable.Rows.Add(row);
        }
    }

Answer 1

我看到你在比较中使用了InvariantCulture。您应该使用Ordinal（逐个字符的文字比较）或CurrentCulture（取代替换 - 如Æ=== AE）来进行比较。

通过输入字符作为Unicode字符串可以获得更好的运气，如：

string text = "\uEFBF\uBDEF\uBFBD\uEFBF";
string text2= "\uEFBF\uBD0D\uAEFB\uFBDE\uFBFB";

我有一些中文/日文字符（完全不粘贴）：

string text = "뷯뾽";
string text2 = "봍껻ﯞﯻ";

CurrentCulture将知道单个符号可以代表其他2个符号，因此它是一个不错的选择。 Ordinal只会注意到长度不同。如果它们的长度相同，并且每个字符的每个Unicode值都相同，那么它将成功。

Answer 2

看起来像字节顺序标记（BOM） - http://en.wikipedia.org/wiki/Byte_order_mark

物料清单可能在比较时被剥离，因此它们会相同吗？

Answer 3

当转换1个字符串以存储在数据表中时，它产生的字符串与另一个字符串相同。所以dataTable上的唯一约束使它抛出异常。

Answer 4

不确定那些很酷的角色所使用的字体或字符集，但它们看起来并不是很好。 Compare（）基于字符串的可排序性而工作，这就是为什么文化对于比较文化敏感字符串非常重要的原因。这些字符串不会从排序角度返回，因此它们实际上是“相同的”。 String.Equals（）方法将它们显示为不同。

buf[0].Equals(buf[1]) = false

不确定为什么需要使用特殊字符，但是如果需要将“唯一”键作为一个问题。我假设数据表使用类似的比较来验证唯一列值，因此将两行视为重复。

Answer 5

或许关闭dataTable唯一的强制约束，并实现自己独特的检查方法？

Answer 6

您的问题可能与您的角色是一个特殊的unicode角色有关。

"The replacement character � (often a black diamond with a white question mark)
is a symbol found in the Unicode standard at codepoint U+FFFD in the Specials 
table. It is used to indicate problems when a system such as a text parser was
not able to decode a stream of data to a correct symbol"

http://en.wikipedia.org/wiki/Unicode_Specials

C＃中非常奇怪的字符串唯一问题

6 个答案: