Question

我试图计算两个字符串之间的差异

例如

string val1 = "Have a good day";
string val2 = "Have a very good day, Joe";

结果将是一个字符串列表，包含2个项目＆＃34;非常＆＃34;和＆＃34;，乔＆＃34;

到目前为止，我对这项任务的研究还没有多少

编辑：结果可能需要是2个单独的字符串列表，一个包含添加项，另一个包含删除项

Answer 1

这是我能想到的最简单的版本：

class Program
{
    static void Main(string[] args)
    {
        string val1 = "Have a good day";
        string val2 = "Have a very good day, Joe";

        MatchCollection words1 = Regex.Matches(val1, @"\b(\w+)\b");
        MatchCollection words2 = Regex.Matches(val2, @"\b(\w+)\b");

        var hs1 = new HashSet<string>(words1.Cast<Match>().Select(m => m.Value));
        var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value));

        // Optionaly you can use a custom comparer for the words.
        // var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value), new MyComparer());

        // h2 contains after this operation only 'very' and 'Joe'
        hs2.ExceptWith(hs1); 

    }
}

custom comparer：

public class MyComparer : IEqualityComparer<string>
{
    public bool Equals(string one, string two)
    {
        return one.Equals(two, StringComparison.OrdinalIgnoreCase);
    }

    public int GetHashCode(string item)
    {
        return item.GetHashCode();
    }
}

Answer 2

其实我按照这个步骤，

（i）Obtain all words来自两个词而不论特殊字符

（ii）从两个列表中找出差异

<强> CODE：

    string s2 = "Have a very good day, Joe";
    IEnumerable<string> diff;
    MatchCollection matches = Regex.Matches(s1, @"\b[\w']*\b");
    IEnumerable<string> first= from m in matches.Cast<Match>()
                where !string.IsNullOrEmpty(m.Value)
                select TrimSuffix(m.Value);
    MatchCollection matches1 = Regex.Matches(s2, @"\b[\w']*\b");
    IEnumerable<string> second = from m in matches1.Cast<Match>()
                                 where !string.IsNullOrEmpty(m.Value)
                                 select TrimSuffix(m.Value);

    if (second.Count() > first.Count())
    {
        diff = second.Except(first).ToList();
    }
    else
    {
        diff = first.Except(second).ToList();
    }
    }
   static string TrimSuffix(string word)
   {
    int apostropheLocation = word.IndexOf('\'');
    if (apostropheLocation != -1)
    {
        word = word.Substring(0, apostropheLocation);
    }
    return word;
   }

<强>输出：非常，乔

Answer 3

此代码：

enum Where { None, First, Second, Both } // somewhere in your source file

//...
var val1 = "Have a good calm day calm calm calm";
var val2 = "Have a very good day, Joe Joe Joe Joe";

var words1 = from m in Regex.Matches(val1, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
                where m.Success
                select m.Value.ToLower();
var words2 = from m in Regex.Matches(val2, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
                where m.Success
                select m.Value.ToLower();

var dic = new Dictionary<string, Where>();
foreach (var s in words1)
{
    dic[s] = Where.First;
}
foreach (var s in words2)
{
    Where b;
    if (!dic.TryGetValue(s, out b)) b = Where.None;

    switch (b)
    {
        case Where.None:
            dic[s] = Where.Second;
            break;
        case Where.First:
            dic[s] = Where.Both;
            break;
    }
}

foreach (var kv in dic.Where(x => x.Value != Where.Both))
{
    Console.WriteLine(kv.Key);
}

让我们平静，＃39;非常＆＃39;，＃39; Joe＆＃39;和乔＃＆＃39;这两个字符串的差异; ＆＃39;平静＆＃39;来自第一个＆＃39;非常＆＃39;，＃39; Joe＆＃39;和乔＃＆＃39;从下一个。它还删除了重复的案例。

并获得两个单独的列表，向我们显示哪个词来自哪个文字：

var list1 = dic.Where(x => x.Value == Where.First).ToList();
var list2 = dic.Where(x => x.Value == Where.Second).ToList();

foreach (var kv in list1)
{
    Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}

foreach (var kv in list2)
{
    Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}

Answer 4

将字符分成两组，然后计算这些集合的相对称赞。

相对称赞将在任何好的集合库中提供。

您可能需要注意保留字符的顺序。

Answer 5

你必须删除'，'以获得预期的结果

  string s1 = "Have a good day";
        string s2 = "Have a very good day, Joe";
        int index = s2.IndexOf(','); <----- get the index of the char to be removed
        IEnumerable<string> diff;
        IEnumerable<string> first = s1.Split(' ').Distinct();
        IEnumerable<string> second = s2.Remove(index, 1).Split(' ').Distinct();<--- remove it
        if (second.Count() > first.Count())
        {
            diff = second.Except(first).ToList();
        }
        else
        {
            diff = first.Except(second).ToList();
        }

获取2个字符串之间的差异

5 个答案: