查找两个字符串中有多少单词相同

时间:2013-02-27 10:21:59

标签: c# sql-server user-defined-functions sqlclr

我有这个功能,我想比较两个字符串,然后返回存在多少个字但是以下不起作用。我似乎总是为SameWordCount获得0,为MasterAddressWordCount获得1

有什么想法吗?

// some more string cleaning
        mastermkAddressKey = mastermkAddressKey.Replace(",", " ").Replace(".", " ").Trim();
        mastermkAddressKey = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(mastermkAddressKey));
        mastermkAddressKey = mastermkAddressKey.Replace("  ", " |").Replace("| ", "").Replace("|", "");
        mastermkAddressKey = QbaseStrings.RemoveDuplicateWords(mastermkAddressKey);

        duplicatemkAddressKey = duplicatemkAddressKey.Replace(",", " ").Replace(".", " ").Trim();
        duplicatemkAddressKey = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(duplicatemkAddressKey));
        duplicatemkAddressKey = duplicatemkAddressKey.Replace("  ", " |").Replace("| ", "").Replace("|", "");
        duplicatemkAddressKey = QbaseStrings.RemoveDuplicateWords(duplicatemkAddressKey);

        string[] masterAddressSeparateWords = mastermkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);
        string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);

        int SameWordCount = 0;
        int MasterAddressWordCount = 0;

        foreach (string masterWord in masterAddressSeparateWords)
                {
                    foreach (string duplicateWord in duplicateAddressSeparateWords)
                    {
                        if (masterWord == duplicateWord) {SameWordCount++;}
                    }

                    MasterAddressWordCount++;
                }

        int WordDifference = MasterAddressWordCount - SameWordCount;

        if (WordDifference == 0) { return "sure"; }
        if (WordDifference > 0 && WordDifference < 3) { return SameWordCount.ToString() + " " + MasterAddressWordCount.ToString(); }
        if (WordDifference > 2 && WordDifference < 5) { return "possible"; }

2 个答案:

答案 0 :(得分:3)

您的问题是由于new char[' '],您在此处的意思是new char[] {' '}。编译器(非常有帮助)将' '转换为int,使其成为char[int]。这意味着:

new char[' ']

真的一样:

new char[32]

最终成为一个无用的char[]数组,而不是你所追求的单个空间。


通过查看为:

生成的IL,您可以清楚地看到这一点
var a = new char[' '];

这是:

IL_0001:  ldc.i4.s    20
IL_0003:  newarr      System.Char
IL_0008:  stloc.0     // a

20是32的十六进制表示。

答案 1 :(得分:0)

我通过更改以下行解决了这个问题:

string[] masterAddressSeparateWords = mastermkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);
        string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);

要:

string[] masterAddressSeparateWords = mastermkAddressKey.Split(' ');
string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(' ');