检查字符串是否等于(至少前2个字母)并包含数字

时间:2015-02-23 14:51:49

标签: c# .net regex string

我有一些包含一些字符串的列表,想要比较字符串并将匹配的字符串保存在新列表中。

匹配意味着:

  • 前2个字符必须是字母
  • 字母(至少前两个字母)必须相等
  • 并且必须在等号后显示一个数字

List<String> items = new List<string>()
    {
        "ne1234",
        "ne2abc",
        "type",
        "12345"
        "12346"
        "s0",
        "s1",
        "numb4er",
        "numb5er",
        "numb8er",
        "tax1-0",
        "tax1-1"
    };

List<String> equalitems = new List<string>();

equalitems

的内容
  • NE *
  • 麻木*
  • 税*

我无法找到解决此问题的好方法。有人有想法吗?

3 个答案:

答案 0 :(得分:2)

var rx = new Regex(@"^\p{L}{2,}(?=\d)");
var grouped = items.GroupBy(x => rx.Match(x).ToString())
                   .Where(x => x.Key != String.Empty)
                   .ToArray();

这将使用正则表达式“提取”首字母(并检查字母后面是否有数字),然后按此键分组。最后,它“过滤掉”具有空键的组(所以一个键不符合你提出的规则)

请注意,正如所写的那样,它是一个集合集合(每个“匹配组”都在一个单独的集合中)

如果您想要所有匹配的项目:

var allTogether = items.GroupBy(x => rx.Match(x).ToString())
                       .Where(x => x.Key != String.Empty)
                       .SelectMany(x => x)
                       .ToArray();

请注意,为“字母”写的,我使用“Unicode”字母(所以甚至是àéèìòù和非英文字母,如阿拉伯语),对于数字,我使用“Unicode”数字。

如果你想要“标准”字母和数字:

var rx = new Regex(@"^[A-Za-z]{2,}(?=[0-9])");

如上所述,与其他任何内容(xxxxxx9)不匹配的单个元素将放入一个组中并返回。添加

.Where(x => x.Count() > 1)

之后

.Where(x => x.Key != String.Empty)

过滤掉这些项目。

答案 1 :(得分:0)

另一种技术是使用实现IEqualityComparer<string>的自定义比较器。

比较器看起来像这样:

public class MyComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {
        return GetHashCode(x) == GetHashCode(y);
    }

    public string GetStringForHash(string x)
    {
        StringBuilder hashString = new StringBuilder();

        if (x.Length > 2 && x.Substring(0, 2).All(v => char.IsLetter(v)))
        {
            for (var i = 0; i < x.Length; i++)
            {
                if (char.IsLetter(x[i]))
                {
                    hashString.Append(x[i]);
                }
                else if (char.IsDigit(x[i]))
                {
                    return hashString.ToString();
                }
                else
                {
                    break;
                }
            }
        }

        return x;
    }

    public int GetHashCode(string x)
    {
        return GetStringForHash(x).GetHashCode();
    }
}

实现看起来像这样

List<String> items = new List<string>()
    {
    "ne1234",
    "ne2abc",
    "type",
    "12345",
    "12346",
    "s0",
    "s1",
    "numb4er",
    "numb5er",
    "numb8er",
    "tax1-0",
    "tax1-1"
};

var comparer = new MyComparer();
var results = items.GroupBy(v => comparer.GetStringForHash(v), comparer).Where(v => v.Count() > 1);

List<String> equalitems = results.SelectMany(v => v).ToList();

由于@xanatos是完美的正则表达式,因此这是使用该正则表达式的GetStringForHash的替代版本

public string GetStringForHash(string x)
{
    var rx = new Regex(@"^\p{L}{2,}(?=\d)");

    var match = rx.Match(x);

    return match.Success ? match.ToString() : x;
}

答案 2 :(得分:0)

我有一个很有趣的想法:)

首先创建字符集:

    private static char[] chars = new char[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' };
    private static char[] nums = new char[] { '1', '2', '3', '4', '5', '6', '7', '8', '9', '0' };

然后创建一个扩展方法,从数字字符开始strimg字符(稍后我们将使用它):

public static class Extensions
{
    public static string RemoveAfterNumber(this string str)
    {
        int ix = str.IndexOfAny("0123456789".ToCharArray());
        if (ix >= 0)
        {
            return str.Substring(0, ix);
        }
        else
        {
            return null;
        }
    }
}

然后按照以下代码:

// Filter string which have non-numeric characters for the the first two letters (Condition 1)
List<String> pass1 = items.Where(x => x.Length > 2 && x.Substring(0, 2).ToCharArray().Intersect(chars).Count() == 2).ToList();

// Get distinct two non-numeric character startings
List<string> pass2 = pass1.GroupBy(x => x.Substring(0, 2)).Where(c => c.Count() > 1).Select(s => s.Key).ToList();

// Get all records starting with these letters
List<string> pass3a = items.Where(x => pass2.Contains(x.Substring(0, 2))).Select(x => x).ToList();

// Remove characters after the first occurance of a numeric character from all records having at least one  numeric character and satisfying the Condition 1. This is also where e use the Extension method.
List<string> pass3b = pass3a.Where(x => x.RemoveAfterNumber() != null).Select(x => x.RemoveAfterNumber()).ToList();

// Group by the repetitions and format as you require
List<string> result = pass3b.Select(x => x + "*").Distinct().ToList();

我并不认为这是最好的解决方案,也不是最合乎逻辑的解决方案。但编码它很有趣:)