我有一些包含一些字符串的列表,想要比较字符串并将匹配的字符串保存在新列表中。
匹配意味着:
List<String> items = new List<string>()
{
"ne1234",
"ne2abc",
"type",
"12345"
"12346"
"s0",
"s1",
"numb4er",
"numb5er",
"numb8er",
"tax1-0",
"tax1-1"
};
List<String> equalitems = new List<string>();
equalitems
我无法找到解决此问题的好方法。有人有想法吗?
答案 0 :(得分:2)
var rx = new Regex(@"^\p{L}{2,}(?=\d)");
var grouped = items.GroupBy(x => rx.Match(x).ToString())
.Where(x => x.Key != String.Empty)
.ToArray();
这将使用正则表达式“提取”首字母(并检查字母后面是否有数字),然后按此键分组。最后,它“过滤掉”具有空键的组(所以一个键不符合你提出的规则)
请注意,正如所写的那样,它是一个集合集合(每个“匹配组”都在一个单独的集合中)
如果您想要所有匹配的项目:
var allTogether = items.GroupBy(x => rx.Match(x).ToString())
.Where(x => x.Key != String.Empty)
.SelectMany(x => x)
.ToArray();
请注意,为“字母”写的,我使用“Unicode”字母(所以甚至是àéèìòù和非英文字母,如阿拉伯语),对于数字,我使用“Unicode”数字。
如果你想要“标准”字母和数字:
var rx = new Regex(@"^[A-Za-z]{2,}(?=[0-9])");
如上所述,与其他任何内容(xxxxxx9
)不匹配的单个元素将放入一个组中并返回。添加
.Where(x => x.Count() > 1)
之后
.Where(x => x.Key != String.Empty)
过滤掉这些项目。
答案 1 :(得分:0)
另一种技术是使用实现IEqualityComparer<string>
的自定义比较器。
比较器看起来像这样:
public class MyComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return GetHashCode(x) == GetHashCode(y);
}
public string GetStringForHash(string x)
{
StringBuilder hashString = new StringBuilder();
if (x.Length > 2 && x.Substring(0, 2).All(v => char.IsLetter(v)))
{
for (var i = 0; i < x.Length; i++)
{
if (char.IsLetter(x[i]))
{
hashString.Append(x[i]);
}
else if (char.IsDigit(x[i]))
{
return hashString.ToString();
}
else
{
break;
}
}
}
return x;
}
public int GetHashCode(string x)
{
return GetStringForHash(x).GetHashCode();
}
}
实现看起来像这样
List<String> items = new List<string>()
{
"ne1234",
"ne2abc",
"type",
"12345",
"12346",
"s0",
"s1",
"numb4er",
"numb5er",
"numb8er",
"tax1-0",
"tax1-1"
};
var comparer = new MyComparer();
var results = items.GroupBy(v => comparer.GetStringForHash(v), comparer).Where(v => v.Count() > 1);
List<String> equalitems = results.SelectMany(v => v).ToList();
由于@xanatos是完美的正则表达式,因此这是使用该正则表达式的GetStringForHash的替代版本
public string GetStringForHash(string x)
{
var rx = new Regex(@"^\p{L}{2,}(?=\d)");
var match = rx.Match(x);
return match.Success ? match.ToString() : x;
}
答案 2 :(得分:0)
我有一个很有趣的想法:)
首先创建字符集:
private static char[] chars = new char[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' };
private static char[] nums = new char[] { '1', '2', '3', '4', '5', '6', '7', '8', '9', '0' };
然后创建一个扩展方法,从数字字符开始strimg字符(稍后我们将使用它):
public static class Extensions
{
public static string RemoveAfterNumber(this string str)
{
int ix = str.IndexOfAny("0123456789".ToCharArray());
if (ix >= 0)
{
return str.Substring(0, ix);
}
else
{
return null;
}
}
}
然后按照以下代码:
// Filter string which have non-numeric characters for the the first two letters (Condition 1)
List<String> pass1 = items.Where(x => x.Length > 2 && x.Substring(0, 2).ToCharArray().Intersect(chars).Count() == 2).ToList();
// Get distinct two non-numeric character startings
List<string> pass2 = pass1.GroupBy(x => x.Substring(0, 2)).Where(c => c.Count() > 1).Select(s => s.Key).ToList();
// Get all records starting with these letters
List<string> pass3a = items.Where(x => pass2.Contains(x.Substring(0, 2))).Select(x => x).ToList();
// Remove characters after the first occurance of a numeric character from all records having at least one numeric character and satisfying the Condition 1. This is also where e use the Extension method.
List<string> pass3b = pass3a.Where(x => x.RemoveAfterNumber() != null).Select(x => x.RemoveAfterNumber()).ToList();
// Group by the repetitions and format as you require
List<string> result = pass3b.Select(x => x + "*").Distinct().ToList();
我并不认为这是最好的解决方案,也不是最合乎逻辑的解决方案。但编码它很有趣:)