我试图计算两个字符串之间的差异
例如
string val1 = "Have a good day";
string val2 = "Have a very good day, Joe";
结果将是一个字符串列表,包含2个项目"非常"和",乔"
到目前为止,我对这项任务的研究还没有多少
编辑:结果可能需要是2个单独的字符串列表,一个包含添加项,另一个包含删除项
答案 0 :(得分:2)
这是我能想到的最简单的版本:
class Program
{
static void Main(string[] args)
{
string val1 = "Have a good day";
string val2 = "Have a very good day, Joe";
MatchCollection words1 = Regex.Matches(val1, @"\b(\w+)\b");
MatchCollection words2 = Regex.Matches(val2, @"\b(\w+)\b");
var hs1 = new HashSet<string>(words1.Cast<Match>().Select(m => m.Value));
var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value));
// Optionaly you can use a custom comparer for the words.
// var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value), new MyComparer());
// h2 contains after this operation only 'very' and 'Joe'
hs2.ExceptWith(hs1);
}
}
public class MyComparer : IEqualityComparer<string>
{
public bool Equals(string one, string two)
{
return one.Equals(two, StringComparison.OrdinalIgnoreCase);
}
public int GetHashCode(string item)
{
return item.GetHashCode();
}
}
答案 1 :(得分:1)
其实我按照这个步骤,
(i)Obtain all words
来自两个词而不论特殊字符
(ii)从两个列表中找出差异
<强> CODE:强>
string s2 = "Have a very good day, Joe";
IEnumerable<string> diff;
MatchCollection matches = Regex.Matches(s1, @"\b[\w']*\b");
IEnumerable<string> first= from m in matches.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
MatchCollection matches1 = Regex.Matches(s2, @"\b[\w']*\b");
IEnumerable<string> second = from m in matches1.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
if (second.Count() > first.Count())
{
diff = second.Except(first).ToList();
}
else
{
diff = first.Except(second).ToList();
}
}
static string TrimSuffix(string word)
{
int apostropheLocation = word.IndexOf('\'');
if (apostropheLocation != -1)
{
word = word.Substring(0, apostropheLocation);
}
return word;
}
<强>输出:强> 非常,乔
答案 2 :(得分:1)
此代码:
enum Where { None, First, Second, Both } // somewhere in your source file
//...
var val1 = "Have a good calm day calm calm calm";
var val2 = "Have a very good day, Joe Joe Joe Joe";
var words1 = from m in Regex.Matches(val1, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
where m.Success
select m.Value.ToLower();
var words2 = from m in Regex.Matches(val2, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
where m.Success
select m.Value.ToLower();
var dic = new Dictionary<string, Where>();
foreach (var s in words1)
{
dic[s] = Where.First;
}
foreach (var s in words2)
{
Where b;
if (!dic.TryGetValue(s, out b)) b = Where.None;
switch (b)
{
case Where.None:
dic[s] = Where.Second;
break;
case Where.First:
dic[s] = Where.Both;
break;
}
}
foreach (var kv in dic.Where(x => x.Value != Where.Both))
{
Console.WriteLine(kv.Key);
}
让我们平静,#39;非常&#39;,#39; Joe&#39;和乔#&#39;这两个字符串的差异; &#39;平静&#39;来自第一个&#39;非常&#39;,#39; Joe&#39;和乔#&#39;从下一个。它还删除了重复的案例。
并获得两个单独的列表,向我们显示哪个词来自哪个文字:
var list1 = dic.Where(x => x.Value == Where.First).ToList();
var list2 = dic.Where(x => x.Value == Where.Second).ToList();
foreach (var kv in list1)
{
Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}
foreach (var kv in list2)
{
Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}
答案 3 :(得分:0)
将字符分成两组,然后计算这些集合的相对称赞。
相对称赞将在任何好的集合库中提供。
您可能需要注意保留字符的顺序。
答案 4 :(得分:-1)
你必须删除','以获得预期的结果
string s1 = "Have a good day";
string s2 = "Have a very good day, Joe";
int index = s2.IndexOf(','); <----- get the index of the char to be removed
IEnumerable<string> diff;
IEnumerable<string> first = s1.Split(' ').Distinct();
IEnumerable<string> second = s2.Remove(index, 1).Split(' ').Distinct();<--- remove it
if (second.Count() > first.Count())
{
diff = second.Except(first).ToList();
}
else
{
diff = first.Except(second).ToList();
}