文件名上的正则表达式

时间:2012-08-20 14:35:35

标签: c# regex

我有一个文件名为SMITH 3H FINAL 03-26-2012.dwg的dwg文件,我正在尝试找到正确的正则表达式以进行验证,因为我每周会有100个文件我需要验证格式文件名是正确的。我对正则表达式知之甚少,我有一些我在下面找到的代码,但它没有传递有效。如果我正确读取第一行,那么它是否期望文件名中有一个逗号,这就是为什么它没有传递为有效?

string filenamePattern = String.Concat("^",
                                                   "([a-z',-.]+\\s+)+",  // HARRIS, SMITH 
                                                   "(\\d{1,2}-\\d{1,2}){1}\\s+",  // 09-06
                                                   "([a-z]+\\s)*",  //
                                                   "((\\#?\\s*(\\d(\\s*|,))*\\d*-\\d+-?H?D?\\d*?),*\\s+(&\\s)*)+",  // #5,6-11H & #4,7,8-11H2, etc
                                                   "([a-z()-]+\\s)*",  // CLIP-OUT (FINAL)
                                                   "(\\d{1,2}-\\d{1,2}(-\\d{2}|-\\d{4})){1}",  // 05-11-2009
                                                   "\\.dwg", // .dwg
                                                   "$");
            RegexOptions options = (RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline | RegexOptions.IgnoreCase);
            Regex reg = new Regex(filenamePattern, options);
            if (reg.IsMatch(filename))
            {
                valid = true;
            }

2 个答案:

答案 0 :(得分:3)

根据您对其他答案的评论,请试试:

^[a-z]+(?:[ -][a-z]+)*\s+\d+H\s+[a-z]+\s+\d{2}-\d{2}-\d{4}\.dwg$

<强>解释

The regular expression:

(?-imsx:^[a-z]+(?:[ -][a-z]+)*\s+\d+H\s+[a-z]+\s+\d{2}-\d{2}-\d{4}\.dwg$)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  [a-z]+                   any character of: 'a' to 'z' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    [ -]                     any character of: ' ', '-'
----------------------------------------------------------------------
    [a-z]+                   any character of: 'a' to 'z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
----------------------------------------------------------------------
  H                        'H'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  [a-z]+                   any character of: 'a' to 'z' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
  \.                       '.'
----------------------------------------------------------------------
  dwg                      'dwg'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

答案 1 :(得分:1)

我就是这样做的:

// This checks for name"(\w)", then space, then 3H (\w{2}), 
// this will only search for two characters, then space
// then date in the form mm-dd-yyyy or dd-mm-yyyy (\d{2}-\d{2}-\d{4})
Regex reg = new Regex(@"(\w*)\s(\w{2})\s(\w*)\s(\d{2}-\d{2}-\d{4})\.dwg");
if(reg.IsMatch(filename))
{
    valid = true;

}

您也可以获得每个小组。请注意,我没有正则表达式来验证正确的课程期间(或者我假设的课程期间,“#5,6-11H&amp;#4,7,8-11H2等”部分)。这将提供一个基本框架,然后您可以拉出该组并在代码中进行检查。它提供了一个更清晰的正则表达式。

编辑:

基于@DaBears的需求,我提出了以下内容:

Regex reg = new Regex(@"(\w*|\w*-\w*|\w*\s\w*)\s(\w{2})\s(\w*)\s(\d{2}-\d{2}-\d{4})\.dwg");
if(reg.IsMatch(filename))
{
    valid = true;

}

这将匹配姓氏,带连字符的名称或空格姓氏,并提供他们在组中的任何内容。