获取2个字符串之间的差异

时间:2014-12-02 16:52:44

标签: c# .net string comparison

我试图计算两个字符串之间的差异

例如

string val1 = "Have a good day";
string val2 = "Have a very good day, Joe";

结果将是一个字符串列表,包含2个项目"非常"和",乔"

到目前为止,我对这项任务的研究还没有多少

编辑:结果可能需要是2个单独的字符串列表,一个包含添加项,另一个包含删除项

5 个答案:

答案 0 :(得分:2)

这是我能想到的最简单的版本:

class Program
{
    static void Main(string[] args)
    {
        string val1 = "Have a good day";
        string val2 = "Have a very good day, Joe";

        MatchCollection words1 = Regex.Matches(val1, @"\b(\w+)\b");
        MatchCollection words2 = Regex.Matches(val2, @"\b(\w+)\b");

        var hs1 = new HashSet<string>(words1.Cast<Match>().Select(m => m.Value));
        var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value));

        // Optionaly you can use a custom comparer for the words.
        // var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value), new MyComparer());

        // h2 contains after this operation only 'very' and 'Joe'
        hs2.ExceptWith(hs1); 

    }
}

custom comparer

public class MyComparer : IEqualityComparer<string>
{
    public bool Equals(string one, string two)
    {
        return one.Equals(two, StringComparison.OrdinalIgnoreCase);
    }

    public int GetHashCode(string item)
    {
        return item.GetHashCode();
    }
}

答案 1 :(得分:1)

其实我按照这个步骤,

(i)Obtain all words来自两个词而不论特殊字符

(ii)从两个列表中找出差异

<强> CODE:

    string s2 = "Have a very good day, Joe";
    IEnumerable<string> diff;
    MatchCollection matches = Regex.Matches(s1, @"\b[\w']*\b");
    IEnumerable<string> first= from m in matches.Cast<Match>()
                where !string.IsNullOrEmpty(m.Value)
                select TrimSuffix(m.Value);
    MatchCollection matches1 = Regex.Matches(s2, @"\b[\w']*\b");
    IEnumerable<string> second = from m in matches1.Cast<Match>()
                                 where !string.IsNullOrEmpty(m.Value)
                                 select TrimSuffix(m.Value);

    if (second.Count() > first.Count())
    {
        diff = second.Except(first).ToList();
    }
    else
    {
        diff = first.Except(second).ToList();
    }
    }
   static string TrimSuffix(string word)
   {
    int apostropheLocation = word.IndexOf('\'');
    if (apostropheLocation != -1)
    {
        word = word.Substring(0, apostropheLocation);
    }
    return word;
   }

<强>输出: 非常,乔

答案 2 :(得分:1)

此代码:

enum Where { None, First, Second, Both } // somewhere in your source file

//...
var val1 = "Have a good calm day calm calm calm";
var val2 = "Have a very good day, Joe Joe Joe Joe";

var words1 = from m in Regex.Matches(val1, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
                where m.Success
                select m.Value.ToLower();
var words2 = from m in Regex.Matches(val2, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
                where m.Success
                select m.Value.ToLower();

var dic = new Dictionary<string, Where>();
foreach (var s in words1)
{
    dic[s] = Where.First;
}
foreach (var s in words2)
{
    Where b;
    if (!dic.TryGetValue(s, out b)) b = Where.None;

    switch (b)
    {
        case Where.None:
            dic[s] = Where.Second;
            break;
        case Where.First:
            dic[s] = Where.Both;
            break;
    }
}

foreach (var kv in dic.Where(x => x.Value != Where.Both))
{
    Console.WriteLine(kv.Key);
}

让我们平静,#39;非常&#39;,#39; Joe&#39;和乔#&#39;这两个字符串的差异; &#39;平静&#39;来自第一个&#39;非常&#39;,#39; Joe&#39;和乔#&#39;从下一个。它还删除了重复的案例。

并获得两个单独的列表,向我们显示哪个词来自哪个文字:

var list1 = dic.Where(x => x.Value == Where.First).ToList();
var list2 = dic.Where(x => x.Value == Where.Second).ToList();

foreach (var kv in list1)
{
    Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}

foreach (var kv in list2)
{
    Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}

答案 3 :(得分:0)

将字符分成两组,然后计算这些集合的相对称赞。

相对称赞将在任何好的集合库中提供。

您可能需要注意保留字符的顺序。

答案 4 :(得分:-1)

你必须删除','以获得预期的结果

  string s1 = "Have a good day";
        string s2 = "Have a very good day, Joe";
        int index = s2.IndexOf(','); <----- get the index of the char to be removed
        IEnumerable<string> diff;
        IEnumerable<string> first = s1.Split(' ').Distinct();
        IEnumerable<string> second = s2.Remove(index, 1).Split(' ').Distinct();<--- remove it
        if (second.Count() > first.Count())
        {
            diff = second.Except(first).ToList();
        }
        else
        {
            diff = first.Except(second).ToList();
        }